DNA methylation quantity is expressed as \(\beta\)
\[ \beta = M/(M + U) \]
Where M = hybridization signal from a methylated version of a cytosine nucleotide and U = hybridization signal from an unmethylated version of a cytosine nucleotide. Beta can be interpreted as the proportion of methylation signal for a probe, and values range from 0 to 1. Beta is easy to interpret for humans, but typically has a bimodal distribution that is suboptimal for statistical modeling. Thus, we analyze M-values, which are another way to express methylation values for probes.
\[ M-value = log_2(M/U) \]
EPIC array has type 1 and type 2 probes. Type 1 probes are from the old 27K array that uses 2 bead types per CpG. The type I probes are labeled I-green and I-red. The newer type 2 probes uses one bead type. Most of the EPIC array probes are type II.
A detection probability represents the probability of a detected signal being background flourescence. If the probability is high, the signal is more likely to be background, and the value should be set to missing.
Standard workflows suggest to remove data points with detection P-value > 0.05.
EWAS array Illumina EPIC 850
Sesame is a bioconductor package
knitr::opts_chunk$set(cache.lazy = FALSE)
Installation of sesame and dependencies went fine, no errors or warnings.
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("sesame", lib = "~/Rlibs")
Installed sesameData as a dependency.
Can check version of loaded packages with sessionInfo()
Sesame version 1.6 used.
This includes libraries for Sesame and Minfi.
library(tidyverse)
library(readxl)
library(knitr)
library(sesame)
library(wheatmap)
library(multtest)
library(limma)
library(RColorBrewer)
library(minfi)
library(IlluminaHumanMethylationEPICmanifest)
library(IlluminaHumanMethylationEPICanno.ilm10b4.hg19)
Set global variables.
dir_out <- "../results/"
Sesame pipeline involves probe masking with pOOBAH, normalization with noob, and nonlinear dye bias correction.
Noob = background subtraction based on normal-exponential deconvolution using out-of-band probes.
The pipeline can be thought of as adding pOOBAH to noob. Then also the non-linear dye bias correction should be better than minfi’s linear correction.
OpenSesame takes 12 minutes, so run it once and save the output.
IDATprefixes <- searchIDATprefixes(dir.name = "../data/raw/idatFiles/")
t1 <- Sys.time()
betas <- openSesame(IDATprefixes)
Sys.time() - t1
## Time difference of 11.92549 mins
#Warnings issued. These are also shown in minfi vignette, so they are expected.
#50: In readChar(con, nchars = n) : truncating string with embedded nuls
#Convert Betas to Mvalues. Check if there are beta values that will result in infinite results.
sum(is.na(betas))
## [1] 4462814
sum(is.na(betas))/length(betas)
## [1] 0.1610579
sum(betas==0 & !is.na(betas))
## [1] 0
sum(betas==1 & !is.na(betas))
## [1] 0
sum(betas<0 & !is.na(betas))
## [1] 0
sum(betas>1 & !is.na(betas))
## [1] 0
Mvals <- BetaValueToMValue(betas)
#column names for betas and Mvals is currently the Basename. Rename it to sampleID to make it easier to understand.
sampsheet <- read_csv("../data/raw/idatFiles/SampleSheet.csv")
#Are there true discrepancies between ExternalSampleID, SampleLabel, and Description?
#row 2 ExternalSampleID shows "+P" but SampleLabel shows "-P" and Description shows "no pyruvate". Many discrepancies like that. I ended up trusting SampleLabel and Description, but is that wrong?
cbind(sampsheet$ExternalSampleID, sampsheet$SampleLabel, sampsheet$Description)
## [,1] [,2] [,3]
## [1,] "C -P 1" "C -P 1" "Non-senescent mtDNA-containing controls no pyruvate"
## [2,] "C +P 1" "C -P 2" "Non-senescent mtDNA-containing controls no pyruvate"
## [3,] "mtD -P 1" "C -P 3" "Non-senescent mtDNA-containing controls no pyruvate"
## [4,] "mtD +P 1" "C -P 4" "Non-senescent mtDNA-containing controls no pyruvate"
## [5,] "C -P 2" "C -P 5" "Non-senescent mtDNA-containing controls no pyruvate"
## [6,] "C +P 2" "C -P 6" "Non-senescent mtDNA-containing controls no pyruvate"
## [7,] "mtD -P 2" "C +P 1" "Non-senescent mtDNA-containing controls with pyruvate"
## [8,] "mtD +P 2" "C +P 2" "Non-senescent mtDNA-containing controls with pyruvate"
## [9,] "C -P 3" "C +P 3" "Non-senescent mtDNA-containing controls with pyruvate"
## [10,] "C +P 3" "C +P 4" "Non-senescent mtDNA-containing controls with pyruvate"
## [11,] "mtD -P 3" "C +P 5" "Non-senescent mtDNA-containing controls with pyruvate"
## [12,] "mtD +P 3" "C +P 6" "Non-senescent mtDNA-containing controls with pyruvate"
## [13,] "C -P 4" "mtD -P 1" "Senescent mtDNA-depleted cells no pyruvate"
## [14,] "C +P 4" "mtD -P 2" "Senescent mtDNA-depleted cells no pyruvate"
## [15,] "mtD -P 4" "mtD -P 3" "Senescent mtDNA-depleted cells no pyruvate"
## [16,] "mtD +P 4" "mtD -P 4" "Senescent mtDNA-depleted cells no pyruvate"
## [17,] "C -P 5" "mtD -P 5" "Senescent mtDNA-depleted cells no pyruvate"
## [18,] "C +P 5" "mtD -P 6" "Senescent mtDNA-depleted cells no pyruvate"
## [19,] "mtD -P 5" "mtD +P 1" "Non-senescent mtDNA-depleted controls cells with pyruvate"
## [20,] "mtD +P 5" "mtD +P 2" "Non-senescent mtDNA-depleted controls cells with pyruvate"
## [21,] "C -P 6" "mtD +P 3" "Non-senescent mtDNA-depleted controls cells with pyruvate"
## [22,] "C +P 6" "mtD +P 4" "Non-senescent mtDNA-depleted controls cells with pyruvate"
## [23,] "mtD -P 6" "mtD +P 5" "Non-senescent mtDNA-depleted controls cells with pyruvate"
## [24,] "mtD +P 6" "mtD +P 6" "Non-senescent mtDNA-depleted controls cells with pyruvate"
## [25,] "- ATV 1" "- ATV" "Non-senescent DMSO-treated control cells"
## [26,] "+ ATV 1" "- ATV" "Non-senescent DMSO-treated control cells"
## [27,] "- ATV 2" "- ATV" "Non-senescent DMSO-treated control cells"
## [28,] "+ ATV 2" "- ATV" "Non-senescent DMSO-treated control cells"
## [29,] "- ATV 3" "+ ATV" "Senescent ATV-treated cells"
## [30,] "+ ATV 3" "+ ATV" "Senescent ATV-treated cells"
## [31,] "- ATV 4" "+ ATV" "Senescent ATV-treated cells"
## [32,] "+ ATV 4" "+ ATV" "Senescent ATV-treated cells"
sum(colnames(betas) != sampsheet$Basename) #not in same order
## [1] 16
cbind(colnames(betas),sampsheet$Basename) #visually show not in same order
## [,1] [,2]
## [1,] "203833040074_R01C01" "203839170002_R01C01"
## [2,] "203833040074_R02C01" "203839170002_R02C01"
## [3,] "203833040074_R03C01" "203839170002_R03C01"
## [4,] "203833040074_R04C01" "203839170002_R04C01"
## [5,] "203833040074_R05C01" "203839170002_R05C01"
## [6,] "203833040074_R06C01" "203839170002_R06C01"
## [7,] "203833040074_R07C01" "203839170002_R07C01"
## [8,] "203833040074_R08C01" "203839170002_R08C01"
## [9,] "203836210101_R01C01" "203836210101_R01C01"
## [10,] "203836210101_R02C01" "203836210101_R02C01"
## [11,] "203836210101_R03C01" "203836210101_R03C01"
## [12,] "203836210101_R04C01" "203836210101_R04C01"
## [13,] "203836210101_R05C01" "203836210101_R05C01"
## [14,] "203836210101_R06C01" "203836210101_R06C01"
## [15,] "203836210101_R07C01" "203836210101_R07C01"
## [16,] "203836210101_R08C01" "203836210101_R08C01"
## [17,] "203839170001_R01C01" "203839170001_R01C01"
## [18,] "203839170001_R02C01" "203839170001_R02C01"
## [19,] "203839170001_R03C01" "203839170001_R03C01"
## [20,] "203839170001_R04C01" "203839170001_R04C01"
## [21,] "203839170001_R05C01" "203839170001_R05C01"
## [22,] "203839170001_R06C01" "203839170001_R06C01"
## [23,] "203839170001_R07C01" "203839170001_R07C01"
## [24,] "203839170001_R08C01" "203839170001_R08C01"
## [25,] "203839170002_R01C01" "203833040074_R01C01"
## [26,] "203839170002_R02C01" "203833040074_R02C01"
## [27,] "203839170002_R03C01" "203833040074_R03C01"
## [28,] "203839170002_R04C01" "203833040074_R04C01"
## [29,] "203839170002_R05C01" "203833040074_R05C01"
## [30,] "203839170002_R06C01" "203833040074_R06C01"
## [31,] "203839170002_R07C01" "203833040074_R07C01"
## [32,] "203839170002_R08C01" "203833040074_R08C01"
cbind(colnames(betas),sampsheet$Basename[match(sampsheet$Basename, colnames(betas))])#now in correct order
## [,1] [,2]
## [1,] "203833040074_R01C01" "203833040074_R01C01"
## [2,] "203833040074_R02C01" "203833040074_R02C01"
## [3,] "203833040074_R03C01" "203833040074_R03C01"
## [4,] "203833040074_R04C01" "203833040074_R04C01"
## [5,] "203833040074_R05C01" "203833040074_R05C01"
## [6,] "203833040074_R06C01" "203833040074_R06C01"
## [7,] "203833040074_R07C01" "203833040074_R07C01"
## [8,] "203833040074_R08C01" "203833040074_R08C01"
## [9,] "203836210101_R01C01" "203836210101_R01C01"
## [10,] "203836210101_R02C01" "203836210101_R02C01"
## [11,] "203836210101_R03C01" "203836210101_R03C01"
## [12,] "203836210101_R04C01" "203836210101_R04C01"
## [13,] "203836210101_R05C01" "203836210101_R05C01"
## [14,] "203836210101_R06C01" "203836210101_R06C01"
## [15,] "203836210101_R07C01" "203836210101_R07C01"
## [16,] "203836210101_R08C01" "203836210101_R08C01"
## [17,] "203839170001_R01C01" "203839170001_R01C01"
## [18,] "203839170001_R02C01" "203839170001_R02C01"
## [19,] "203839170001_R03C01" "203839170001_R03C01"
## [20,] "203839170001_R04C01" "203839170001_R04C01"
## [21,] "203839170001_R05C01" "203839170001_R05C01"
## [22,] "203839170001_R06C01" "203839170001_R06C01"
## [23,] "203839170001_R07C01" "203839170001_R07C01"
## [24,] "203839170001_R08C01" "203839170001_R08C01"
## [25,] "203839170002_R01C01" "203839170002_R01C01"
## [26,] "203839170002_R02C01" "203839170002_R02C01"
## [27,] "203839170002_R03C01" "203839170002_R03C01"
## [28,] "203839170002_R04C01" "203839170002_R04C01"
## [29,] "203839170002_R05C01" "203839170002_R05C01"
## [30,] "203839170002_R06C01" "203839170002_R06C01"
## [31,] "203839170002_R07C01" "203839170002_R07C01"
## [32,] "203839170002_R08C01" "203839170002_R08C01"
#reorder sampsheet to match sample order in data matrix
sampsheet <- sampsheet[match(sampsheet$Basename, colnames(betas)),]
sampsheet <- dplyr::rename(sampsheet, sampleID = ExternalSampleID)
cbind(sampsheet$sampleID, sampsheet$SampleLabel, sampsheet$Description)
## [,1] [,2] [,3]
## [1,] "- ATV 1" "- ATV" "Non-senescent DMSO-treated control cells"
## [2,] "+ ATV 1" "- ATV" "Non-senescent DMSO-treated control cells"
## [3,] "- ATV 2" "- ATV" "Non-senescent DMSO-treated control cells"
## [4,] "+ ATV 2" "- ATV" "Non-senescent DMSO-treated control cells"
## [5,] "- ATV 3" "+ ATV" "Senescent ATV-treated cells"
## [6,] "+ ATV 3" "+ ATV" "Senescent ATV-treated cells"
## [7,] "- ATV 4" "+ ATV" "Senescent ATV-treated cells"
## [8,] "+ ATV 4" "+ ATV" "Senescent ATV-treated cells"
## [9,] "C -P 3" "C +P 3" "Non-senescent mtDNA-containing controls with pyruvate"
## [10,] "C +P 3" "C +P 4" "Non-senescent mtDNA-containing controls with pyruvate"
## [11,] "mtD -P 3" "C +P 5" "Non-senescent mtDNA-containing controls with pyruvate"
## [12,] "mtD +P 3" "C +P 6" "Non-senescent mtDNA-containing controls with pyruvate"
## [13,] "C -P 4" "mtD -P 1" "Senescent mtDNA-depleted cells no pyruvate"
## [14,] "C +P 4" "mtD -P 2" "Senescent mtDNA-depleted cells no pyruvate"
## [15,] "mtD -P 4" "mtD -P 3" "Senescent mtDNA-depleted cells no pyruvate"
## [16,] "mtD +P 4" "mtD -P 4" "Senescent mtDNA-depleted cells no pyruvate"
## [17,] "C -P 5" "mtD -P 5" "Senescent mtDNA-depleted cells no pyruvate"
## [18,] "C +P 5" "mtD -P 6" "Senescent mtDNA-depleted cells no pyruvate"
## [19,] "mtD -P 5" "mtD +P 1" "Non-senescent mtDNA-depleted controls cells with pyruvate"
## [20,] "mtD +P 5" "mtD +P 2" "Non-senescent mtDNA-depleted controls cells with pyruvate"
## [21,] "C -P 6" "mtD +P 3" "Non-senescent mtDNA-depleted controls cells with pyruvate"
## [22,] "C +P 6" "mtD +P 4" "Non-senescent mtDNA-depleted controls cells with pyruvate"
## [23,] "mtD -P 6" "mtD +P 5" "Non-senescent mtDNA-depleted controls cells with pyruvate"
## [24,] "mtD +P 6" "mtD +P 6" "Non-senescent mtDNA-depleted controls cells with pyruvate"
## [25,] "C -P 1" "C -P 1" "Non-senescent mtDNA-containing controls no pyruvate"
## [26,] "C +P 1" "C -P 2" "Non-senescent mtDNA-containing controls no pyruvate"
## [27,] "mtD -P 1" "C -P 3" "Non-senescent mtDNA-containing controls no pyruvate"
## [28,] "mtD +P 1" "C -P 4" "Non-senescent mtDNA-containing controls no pyruvate"
## [29,] "C -P 2" "C -P 5" "Non-senescent mtDNA-containing controls no pyruvate"
## [30,] "C +P 2" "C -P 6" "Non-senescent mtDNA-containing controls no pyruvate"
## [31,] "mtD -P 2" "C +P 1" "Non-senescent mtDNA-containing controls with pyruvate"
## [32,] "mtD +P 2" "C +P 2" "Non-senescent mtDNA-containing controls with pyruvate"
sampsheet$sampleID[1:8] <- paste0("ATV", 1:8)
sampsheet$sampleID[9:32] <- paste0("midas", 1:24)
cbind(sampsheet$sampleID, sampsheet$SampleLabel, sampsheet$Description)
## [,1] [,2] [,3]
## [1,] "ATV1" "- ATV" "Non-senescent DMSO-treated control cells"
## [2,] "ATV2" "- ATV" "Non-senescent DMSO-treated control cells"
## [3,] "ATV3" "- ATV" "Non-senescent DMSO-treated control cells"
## [4,] "ATV4" "- ATV" "Non-senescent DMSO-treated control cells"
## [5,] "ATV5" "+ ATV" "Senescent ATV-treated cells"
## [6,] "ATV6" "+ ATV" "Senescent ATV-treated cells"
## [7,] "ATV7" "+ ATV" "Senescent ATV-treated cells"
## [8,] "ATV8" "+ ATV" "Senescent ATV-treated cells"
## [9,] "midas1" "C +P 3" "Non-senescent mtDNA-containing controls with pyruvate"
## [10,] "midas2" "C +P 4" "Non-senescent mtDNA-containing controls with pyruvate"
## [11,] "midas3" "C +P 5" "Non-senescent mtDNA-containing controls with pyruvate"
## [12,] "midas4" "C +P 6" "Non-senescent mtDNA-containing controls with pyruvate"
## [13,] "midas5" "mtD -P 1" "Senescent mtDNA-depleted cells no pyruvate"
## [14,] "midas6" "mtD -P 2" "Senescent mtDNA-depleted cells no pyruvate"
## [15,] "midas7" "mtD -P 3" "Senescent mtDNA-depleted cells no pyruvate"
## [16,] "midas8" "mtD -P 4" "Senescent mtDNA-depleted cells no pyruvate"
## [17,] "midas9" "mtD -P 5" "Senescent mtDNA-depleted cells no pyruvate"
## [18,] "midas10" "mtD -P 6" "Senescent mtDNA-depleted cells no pyruvate"
## [19,] "midas11" "mtD +P 1" "Non-senescent mtDNA-depleted controls cells with pyruvate"
## [20,] "midas12" "mtD +P 2" "Non-senescent mtDNA-depleted controls cells with pyruvate"
## [21,] "midas13" "mtD +P 3" "Non-senescent mtDNA-depleted controls cells with pyruvate"
## [22,] "midas14" "mtD +P 4" "Non-senescent mtDNA-depleted controls cells with pyruvate"
## [23,] "midas15" "mtD +P 5" "Non-senescent mtDNA-depleted controls cells with pyruvate"
## [24,] "midas16" "mtD +P 6" "Non-senescent mtDNA-depleted controls cells with pyruvate"
## [25,] "midas17" "C -P 1" "Non-senescent mtDNA-containing controls no pyruvate"
## [26,] "midas18" "C -P 2" "Non-senescent mtDNA-containing controls no pyruvate"
## [27,] "midas19" "C -P 3" "Non-senescent mtDNA-containing controls no pyruvate"
## [28,] "midas20" "C -P 4" "Non-senescent mtDNA-containing controls no pyruvate"
## [29,] "midas21" "C -P 5" "Non-senescent mtDNA-containing controls no pyruvate"
## [30,] "midas22" "C -P 6" "Non-senescent mtDNA-containing controls no pyruvate"
## [31,] "midas23" "C +P 1" "Non-senescent mtDNA-containing controls with pyruvate"
## [32,] "midas24" "C +P 2" "Non-senescent mtDNA-containing controls with pyruvate"
translate_trt <- function(ch){
switch(ch,
"Non-senescent DMSO-treated control cells" = "noSen_ATV",
"Non-senescent mtDNA-containing controls no pyruvate" = "noSen_Mt_noPy",
"Non-senescent mtDNA-containing controls with pyruvate" = "noSen_Mt_Py",
"Non-senescent mtDNA-depleted controls cells with pyruvate" = "noSen_noMt_Py",
"Senescent ATV-treated cells" = "sen_ATV",
"Senescent mtDNA-depleted cells no pyruvate" = "sen_noMt_noPy"
)
}
sampsheet <- sampsheet %>%
mutate(treatment = map_chr(Description, translate_trt)) %>%
mutate(treatment = factor(treatment))
cbind(sampsheet$sampleID, sampsheet$SampleLabel, sampsheet$Description, as.character(sampsheet$treatment))
## [,1] [,2] [,3] [,4]
## [1,] "ATV1" "- ATV" "Non-senescent DMSO-treated control cells" "noSen_ATV"
## [2,] "ATV2" "- ATV" "Non-senescent DMSO-treated control cells" "noSen_ATV"
## [3,] "ATV3" "- ATV" "Non-senescent DMSO-treated control cells" "noSen_ATV"
## [4,] "ATV4" "- ATV" "Non-senescent DMSO-treated control cells" "noSen_ATV"
## [5,] "ATV5" "+ ATV" "Senescent ATV-treated cells" "sen_ATV"
## [6,] "ATV6" "+ ATV" "Senescent ATV-treated cells" "sen_ATV"
## [7,] "ATV7" "+ ATV" "Senescent ATV-treated cells" "sen_ATV"
## [8,] "ATV8" "+ ATV" "Senescent ATV-treated cells" "sen_ATV"
## [9,] "midas1" "C +P 3" "Non-senescent mtDNA-containing controls with pyruvate" "noSen_Mt_Py"
## [10,] "midas2" "C +P 4" "Non-senescent mtDNA-containing controls with pyruvate" "noSen_Mt_Py"
## [11,] "midas3" "C +P 5" "Non-senescent mtDNA-containing controls with pyruvate" "noSen_Mt_Py"
## [12,] "midas4" "C +P 6" "Non-senescent mtDNA-containing controls with pyruvate" "noSen_Mt_Py"
## [13,] "midas5" "mtD -P 1" "Senescent mtDNA-depleted cells no pyruvate" "sen_noMt_noPy"
## [14,] "midas6" "mtD -P 2" "Senescent mtDNA-depleted cells no pyruvate" "sen_noMt_noPy"
## [15,] "midas7" "mtD -P 3" "Senescent mtDNA-depleted cells no pyruvate" "sen_noMt_noPy"
## [16,] "midas8" "mtD -P 4" "Senescent mtDNA-depleted cells no pyruvate" "sen_noMt_noPy"
## [17,] "midas9" "mtD -P 5" "Senescent mtDNA-depleted cells no pyruvate" "sen_noMt_noPy"
## [18,] "midas10" "mtD -P 6" "Senescent mtDNA-depleted cells no pyruvate" "sen_noMt_noPy"
## [19,] "midas11" "mtD +P 1" "Non-senescent mtDNA-depleted controls cells with pyruvate" "noSen_noMt_Py"
## [20,] "midas12" "mtD +P 2" "Non-senescent mtDNA-depleted controls cells with pyruvate" "noSen_noMt_Py"
## [21,] "midas13" "mtD +P 3" "Non-senescent mtDNA-depleted controls cells with pyruvate" "noSen_noMt_Py"
## [22,] "midas14" "mtD +P 4" "Non-senescent mtDNA-depleted controls cells with pyruvate" "noSen_noMt_Py"
## [23,] "midas15" "mtD +P 5" "Non-senescent mtDNA-depleted controls cells with pyruvate" "noSen_noMt_Py"
## [24,] "midas16" "mtD +P 6" "Non-senescent mtDNA-depleted controls cells with pyruvate" "noSen_noMt_Py"
## [25,] "midas17" "C -P 1" "Non-senescent mtDNA-containing controls no pyruvate" "noSen_Mt_noPy"
## [26,] "midas18" "C -P 2" "Non-senescent mtDNA-containing controls no pyruvate" "noSen_Mt_noPy"
## [27,] "midas19" "C -P 3" "Non-senescent mtDNA-containing controls no pyruvate" "noSen_Mt_noPy"
## [28,] "midas20" "C -P 4" "Non-senescent mtDNA-containing controls no pyruvate" "noSen_Mt_noPy"
## [29,] "midas21" "C -P 5" "Non-senescent mtDNA-containing controls no pyruvate" "noSen_Mt_noPy"
## [30,] "midas22" "C -P 6" "Non-senescent mtDNA-containing controls no pyruvate" "noSen_Mt_noPy"
## [31,] "midas23" "C +P 1" "Non-senescent mtDNA-containing controls with pyruvate" "noSen_Mt_Py"
## [32,] "midas24" "C +P 2" "Non-senescent mtDNA-containing controls with pyruvate" "noSen_Mt_Py"
colnames(betas) <- sampsheet$sampleID
colnames(Mvals) <- sampsheet$sampleID
row.names(sampsheet) <- sampsheet$sampleID
#Probe annotation
dat_fData <- read_tsv("~/bigdata/EWAStools/arrayAnnotation/EPIC.hg19.manifest.tsv")
length(rownames(betas)) == length(dat_fData$probeID)
## [1] TRUE
sum(rownames(betas) != dat_fData$probeID)
## [1] 865915
#Reorder annotation to match betas
dat_fData <- dat_fData[match(rownames(betas), dat_fData$probeID),]
sum(rownames(betas) != dat_fData$probeID)
## [1] 0
head(dat_fData$probeID)
## [1] "cg00000029" "cg00000103" "cg00000109" "cg00000155" "cg00000158" "cg00000165"
head(rownames(betas))
## [1] "cg00000029" "cg00000103" "cg00000109" "cg00000155" "cg00000158" "cg00000165"
row.names(dat_fData) <- dat_fData$probeID
eset_betas <- ExpressionSet(assayData = betas,
phenoData = AnnotatedDataFrame(sampsheet),
featureData = AnnotatedDataFrame(dat_fData)
)
write_rds(eset_betas, path = "../data/formatted/eset_betas.rds")
eset_Mvals <- ExpressionSet(assayData = Mvals,
phenoData = AnnotatedDataFrame(sampsheet),
featureData = AnnotatedDataFrame(dat_fData)
)
write_rds(eset_Mvals, path = "../data/formatted/eset_Mvals.rds")
The sesame pipeline above already produced cleaned data. This chunk goes backwards and examines QC of the raw data again.
Read in idats as sigsets, apply QC.
The Sigset data structure is an S4 class with 6 slots. Using lapply with readIDATpair creates a list of Sigset objects. Each list element is a Sigset for each sample. To plot or anything across samples, you’ll need to apply a function across the list.
pdat <- pData(eset_Mvals)
IDATprefixes <- searchIDATprefixes(dir.name = "../data/raw/idatFiles/")
ssets <- lapply(IDATprefixes, readIDATpair)
qc10 <- do.call(rbind, lapply(ssets, function(x) as.data.frame(sesameQC(x))))
#add sample names
sum(names(ssets) != pdat$Basename)
## [1] 0
cbind(names(ssets), pdat$Basename)
## [,1] [,2]
## [1,] "203833040074_R01C01" "203833040074_R01C01"
## [2,] "203833040074_R02C01" "203833040074_R02C01"
## [3,] "203833040074_R03C01" "203833040074_R03C01"
## [4,] "203833040074_R04C01" "203833040074_R04C01"
## [5,] "203833040074_R05C01" "203833040074_R05C01"
## [6,] "203833040074_R06C01" "203833040074_R06C01"
## [7,] "203833040074_R07C01" "203833040074_R07C01"
## [8,] "203833040074_R08C01" "203833040074_R08C01"
## [9,] "203836210101_R01C01" "203836210101_R01C01"
## [10,] "203836210101_R02C01" "203836210101_R02C01"
## [11,] "203836210101_R03C01" "203836210101_R03C01"
## [12,] "203836210101_R04C01" "203836210101_R04C01"
## [13,] "203836210101_R05C01" "203836210101_R05C01"
## [14,] "203836210101_R06C01" "203836210101_R06C01"
## [15,] "203836210101_R07C01" "203836210101_R07C01"
## [16,] "203836210101_R08C01" "203836210101_R08C01"
## [17,] "203839170001_R01C01" "203839170001_R01C01"
## [18,] "203839170001_R02C01" "203839170001_R02C01"
## [19,] "203839170001_R03C01" "203839170001_R03C01"
## [20,] "203839170001_R04C01" "203839170001_R04C01"
## [21,] "203839170001_R05C01" "203839170001_R05C01"
## [22,] "203839170001_R06C01" "203839170001_R06C01"
## [23,] "203839170001_R07C01" "203839170001_R07C01"
## [24,] "203839170001_R08C01" "203839170001_R08C01"
## [25,] "203839170002_R01C01" "203839170002_R01C01"
## [26,] "203839170002_R02C01" "203839170002_R02C01"
## [27,] "203839170002_R03C01" "203839170002_R03C01"
## [28,] "203839170002_R04C01" "203839170002_R04C01"
## [29,] "203839170002_R05C01" "203839170002_R05C01"
## [30,] "203839170002_R06C01" "203839170002_R06C01"
## [31,] "203839170002_R07C01" "203839170002_R07C01"
## [32,] "203839170002_R08C01" "203839170002_R08C01"
qc10 <- qc10 %>%
mutate(sample_name = pdat$SampleLabel) %>%
mutate(sample_desc = pdat$Description) %>%
dplyr::select(sample_name, sample_desc, everything())
#frac_na_cg = percentage of cg probes with NA
#mean_intensity = for type II probes. Low intensity typically bad quality
#mean_beta_cg = mean beta cg probes
#frac_meth_cg = percentage cg probes with beta > 0.7
#frac_unmeth_cg = percentage cg probes with beta < 0.3
#GCT = residual incomplete bisulfite conversion. Closer to one = more complete conversion.
qcvars <- c("sample_name", "sample_desc", "num_probes_cg", "num_na_cg", "frac_na_cg", "mean_intensity", "mean_beta_cg", "frac_meth_cg", "frac_unmeth_cg", "sex", "age", "ethnicity", "GCT")
qc10 %>%
dplyr::select(any_of(qcvars)) %>%
kable
| sample_name | sample_desc | num_probes_cg | num_na_cg | frac_na_cg | mean_intensity | mean_beta_cg | frac_meth_cg | frac_unmeth_cg | sex | age | ethnicity | GCT |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| - ATV | Non-senescent DMSO-treated control cells | 862927 | 142190 | 16.47764 | 3853.198 | 0.4964446 | 38.22754 | 34.88263 | FEMALE | 4.832618 | WHITE | 1.129563 |
| - ATV | Non-senescent DMSO-treated control cells | 862927 | 146181 | 16.94014 | 4293.779 | 0.4969831 | 38.37119 | 34.87205 | FEMALE | 4.971872 | WHITE | 1.123527 |
| - ATV | Non-senescent DMSO-treated control cells | 862927 | 138624 | 16.06439 | 4789.837 | 0.5022481 | 38.88884 | 34.57434 | FEMALE | 4.618823 | WHITE | 1.125875 |
| - ATV | Non-senescent DMSO-treated control cells | 862927 | 147279 | 17.06738 | 4654.279 | 0.5134016 | 40.95002 | 34.04774 | FEMALE | 4.610379 | WHITE | 1.137029 |
| + ATV | Senescent ATV-treated cells | 862927 | 157665 | 18.27095 | 4822.542 | 0.5169446 | 41.47125 | 33.69882 | FEMALE | 4.455712 | WHITE | 1.130623 |
| + ATV | Senescent ATV-treated cells | 862927 | 142226 | 16.48181 | 4911.161 | 0.5186201 | 41.85134 | 33.85565 | FEMALE | 4.275990 | WHITE | 1.132838 |
| + ATV | Senescent ATV-treated cells | 862927 | 146592 | 16.98776 | 5139.840 | 0.5152724 | 41.37045 | 34.03184 | FEMALE | 4.044046 | WHITE | 1.139674 |
| + ATV | Senescent ATV-treated cells | 862927 | 144026 | 16.69040 | 5281.459 | 0.5174685 | 41.79421 | 33.98368 | FEMALE | 3.729263 | WHITE | 1.140673 |
| C +P 3 | Non-senescent mtDNA-containing controls with pyruvate | 862927 | 151212 | 17.52315 | 4266.443 | 0.5021308 | 39.58214 | 34.10059 | FEMALE | 4.937657 | WHITE | 1.139620 |
| C +P 4 | Non-senescent mtDNA-containing controls with pyruvate | 862927 | 136238 | 15.78789 | 4265.593 | 0.5080209 | 40.41674 | 34.09877 | FEMALE | 5.257289 | WHITE | 1.148064 |
| C +P 5 | Non-senescent mtDNA-containing controls with pyruvate | 862927 | 134673 | 15.60653 | 4748.199 | 0.5077737 | 41.17725 | 35.43667 | FEMALE | 3.921508 | WHITE | 1.146814 |
| C +P 6 | Non-senescent mtDNA-containing controls with pyruvate | 862927 | 131662 | 15.25761 | 4991.073 | 0.5124335 | 41.60489 | 35.41110 | FEMALE | 4.067941 | WHITE | 1.148517 |
| mtD -P 1 | Senescent mtDNA-depleted cells no pyruvate | 862927 | 151648 | 17.57368 | 4599.550 | 0.5244305 | 43.07367 | 33.03176 | FEMALE | 4.194471 | WHITE | 1.137215 |
| mtD -P 2 | Senescent mtDNA-depleted cells no pyruvate | 862927 | 135063 | 15.65173 | 4513.403 | 0.5291737 | 43.78126 | 33.12624 | FEMALE | 4.220530 | WHITE | 1.146916 |
| mtD -P 3 | Senescent mtDNA-depleted cells no pyruvate | 862927 | 136418 | 15.80875 | 5172.337 | 0.5196899 | 42.87297 | 35.09619 | FEMALE | 4.421055 | WHITE | 1.152611 |
| mtD -P 4 | Senescent mtDNA-depleted cells no pyruvate | 862927 | 137301 | 15.91108 | 5378.971 | 0.5184395 | 42.65737 | 35.20670 | FEMALE | 3.933508 | WHITE | 1.148173 |
| mtD -P 5 | Senescent mtDNA-depleted cells no pyruvate | 862927 | 140866 | 16.32421 | 3835.542 | 0.5115558 | 41.23613 | 33.90725 | FEMALE | 4.985296 | WHITE | 1.144645 |
| mtD -P 6 | Senescent mtDNA-depleted cells no pyruvate | 862927 | 140723 | 16.30764 | 4089.288 | 0.5136177 | 41.56194 | 33.88461 | FEMALE | 3.777414 | WHITE | 1.144287 |
| mtD +P 1 | Non-senescent mtDNA-depleted controls cells with pyruvate | 862927 | 128391 | 14.87855 | 4923.509 | 0.5097017 | 41.40995 | 35.61337 | FEMALE | 5.212871 | WHITE | 1.156965 |
| mtD +P 2 | Non-senescent mtDNA-depleted controls cells with pyruvate | 862927 | 123732 | 14.33864 | 5422.586 | 0.5031447 | 40.20928 | 36.20168 | FEMALE | 4.542549 | WHITE | 1.163880 |
| mtD +P 3 | Non-senescent mtDNA-depleted controls cells with pyruvate | 862927 | 133861 | 15.51244 | 5411.625 | 0.5134522 | 41.52793 | 34.09156 | FEMALE | 3.963279 | WHITE | 1.149286 |
| mtD +P 4 | Non-senescent mtDNA-depleted controls cells with pyruvate | 862927 | 124219 | 14.39508 | 5579.510 | 0.5095529 | 40.86716 | 34.56318 | FEMALE | 4.409198 | WHITE | 1.152347 |
| mtD +P 5 | Non-senescent mtDNA-depleted controls cells with pyruvate | 862927 | 125684 | 14.56485 | 5870.422 | 0.5174637 | 42.35266 | 35.16778 | FEMALE | 3.988534 | WHITE | 1.154209 |
| mtD +P 6 | Non-senescent mtDNA-depleted controls cells with pyruvate | 862927 | 127452 | 14.76973 | 5692.319 | 0.5127907 | 41.76117 | 35.44961 | FEMALE | 3.433061 | WHITE | 1.151243 |
| C -P 1 | Non-senescent mtDNA-containing controls no pyruvate | 862927 | 145220 | 16.82877 | 4081.942 | 0.5077514 | 40.44324 | 33.90966 | FEMALE | 4.977316 | WHITE | 1.142858 |
| C -P 2 | Non-senescent mtDNA-containing controls no pyruvate | 862927 | 143572 | 16.63779 | 4457.889 | 0.5109200 | 40.89400 | 33.76163 | FEMALE | 4.583926 | WHITE | 1.131215 |
| C -P 3 | Non-senescent mtDNA-containing controls no pyruvate | 862927 | 138742 | 16.07807 | 5237.084 | 0.5069778 | 41.07597 | 35.48624 | FEMALE | 4.227716 | WHITE | 1.141324 |
| C -P 4 | Non-senescent mtDNA-containing controls no pyruvate | 862927 | 129921 | 15.05585 | 5424.897 | 0.5104271 | 41.43759 | 35.57120 | FEMALE | 4.422332 | WHITE | 1.151486 |
| C -P 5 | Non-senescent mtDNA-containing controls no pyruvate | 862927 | 143311 | 16.60755 | 5429.236 | 0.5192146 | 42.30506 | 33.53525 | FEMALE | 3.666028 | WHITE | 1.141509 |
| C -P 6 | Non-senescent mtDNA-containing controls no pyruvate | 862927 | 134557 | 15.59309 | 5365.433 | 0.5218914 | 42.72060 | 33.64677 | FEMALE | 3.943228 | WHITE | 1.143762 |
| C +P 1 | Non-senescent mtDNA-containing controls with pyruvate | 862927 | 137805 | 15.96949 | 5765.728 | 0.5133604 | 42.13291 | 35.43114 | FEMALE | 3.655590 | WHITE | 1.147457 |
| C +P 2 | Non-senescent mtDNA-containing controls with pyruvate | 862927 | 135729 | 15.72891 | 5567.668 | 0.5173618 | 42.70515 | 35.54616 | FEMALE | 3.922531 | WHITE | 1.142287 |
The fraction of NAs are signs of masking due to variety of reasons including failed detection, high background, putative low quality probes etc.
Now we move back to the QCed betas generated from the first step.
First, remove failed samples and probes. How many samples are missing across all probes? How many probes are missing across all samples?
dim(eset_Mvals) #865,918 probes
## Features Samples
## 865918 32
e <- exprs(eset_Mvals)
pdat <- pData(eset_Mvals)
miss_sample <- apply(e, 2, function(x) sum(is.na(x))/length(x) )
miss_probe_blank <- apply(e, 1, function(x) sum(is.na(x))/length(x) )
length(miss_sample)
## [1] 32
length(miss_probe_blank)
## [1] 865918
sum(miss_sample >= 0.95) #0 samples have missing rate greater than 95%
## [1] 0
#no samples that are essentially blanks.
#show missingness for all samples
data.frame(sampleID = names(miss_sample), treatment = pdat$treatment, miss_sample = miss_sample, stringsAsFactors = F) %>%
kable()
| sampleID | treatment | miss_sample | |
|---|---|---|---|
| ATV1 | ATV1 | noSen_ATV | 0.1653240 |
| ATV2 | ATV2 | noSen_ATV | 0.1699641 |
| ATV3 | ATV3 | noSen_ATV | 0.1611273 |
| ATV4 | ATV4 | noSen_ATV | 0.1712310 |
| ATV5 | ATV5 | sen_ATV | 0.1833418 |
| ATV6 | ATV6 | sen_ATV | 0.1653228 |
| ATV7 | ATV7 | sen_ATV | 0.1704168 |
| ATV8 | ATV8 | sen_ATV | 0.1674177 |
| midas1 | midas1 | noSen_Mt_Py | 0.1760120 |
| midas2 | midas2 | noSen_Mt_Py | 0.1584596 |
| midas3 | midas3 | noSen_Mt_Py | 0.1565876 |
| midas4 | midas4 | noSen_Mt_Py | 0.1530457 |
| midas5 | midas5 | sen_noMt_noPy | 0.1764867 |
| midas6 | midas6 | sen_noMt_noPy | 0.1570611 |
| midas7 | midas7 | sen_noMt_noPy | 0.1586074 |
| midas8 | midas8 | sen_noMt_noPy | 0.1596098 |
| midas9 | midas9 | sen_noMt_noPy | 0.1638769 |
| midas10 | midas10 | sen_noMt_noPy | 0.1636991 |
| midas11 | midas11 | noSen_noMt_Py | 0.1491850 |
| midas12 | midas12 | noSen_noMt_Py | 0.1437215 |
| midas13 | midas13 | noSen_noMt_Py | 0.1555979 |
| midas14 | midas14 | noSen_noMt_Py | 0.1443035 |
| midas15 | midas15 | noSen_noMt_Py | 0.1459977 |
| midas16 | midas16 | noSen_noMt_Py | 0.1480683 |
| midas17 | midas17 | noSen_Mt_noPy | 0.1689594 |
| midas18 | midas18 | noSen_Mt_noPy | 0.1670193 |
| midas19 | midas19 | noSen_Mt_noPy | 0.1613236 |
| midas20 | midas20 | noSen_Mt_noPy | 0.1510028 |
| midas21 | midas21 | noSen_Mt_noPy | 0.1666925 |
| midas22 | midas22 | noSen_Mt_noPy | 0.1564282 |
| midas23 | midas23 | noSen_Mt_Py | 0.1601918 |
| midas24 | midas24 | noSen_Mt_Py | 0.1577701 |
sum(miss_probe_blank >= 1) #119,631 completely missing probes. Many are masked by design.
## [1] 119631
sum(miss_probe_blank >= 0.95) #121,925 Remove these, then determine number of samples with missing rate>0.05.
## [1] 121925
#keep probes with < 0.95 missingness
eset_Mvals_clean <- eset_Mvals[miss_probe_blank < 0.95,]
dim(eset_Mvals_clean) #743,993 probes
## Features Samples
## 743993 32
eset_betas_clean <- eset_betas[miss_probe_blank < 0.95,]
dim(eset_betas_clean)
## Features Samples
## 743993 32
#After removing failed probes, estimate sample missing rate.
e <- exprs(eset_Mvals_clean)
miss_sample <- apply(e, 2, function(x) sum(is.na(x))/length(x) )
miss_probe <- apply(e, 1, function(x) sum(is.na(x))/length(x) )
data.frame(sampleID = names(miss_sample), treatment = pdat$treatment, miss_sample = miss_sample, stringsAsFactors = F) %>%
kable()
| sampleID | treatment | miss_sample | |
|---|---|---|---|
| ATV1 | ATV1 | noSen_ATV | 0.0285849 |
| ATV2 | ATV2 | noSen_ATV | 0.0339506 |
| ATV3 | ATV3 | noSen_ATV | 0.0237166 |
| ATV4 | ATV4 | noSen_ATV | 0.0354264 |
| ATV5 | ATV5 | sen_ATV | 0.0495099 |
| ATV6 | ATV6 | sen_ATV | 0.0285540 |
| ATV7 | ATV7 | sen_ATV | 0.0344788 |
| ATV8 | ATV8 | sen_ATV | 0.0310030 |
| midas1 | midas1 | noSen_Mt_Py | 0.0409843 |
| midas2 | midas2 | noSen_Mt_Py | 0.0205997 |
| midas3 | midas3 | noSen_Mt_Py | 0.0184061 |
| midas4 | midas4 | noSen_Mt_Py | 0.0143026 |
| midas5 | midas5 | sen_noMt_noPy | 0.0415300 |
| midas6 | midas6 | sen_noMt_noPy | 0.0189814 |
| midas7 | midas7 | sen_noMt_noPy | 0.0207341 |
| midas8 | midas8 | sen_noMt_noPy | 0.0219034 |
| midas9 | midas9 | sen_noMt_noPy | 0.0268739 |
| midas10 | midas10 | sen_noMt_noPy | 0.0266669 |
| midas11 | midas11 | noSen_noMt_Py | 0.0098737 |
| midas12 | midas12 | noSen_noMt_Py | 0.0044113 |
| midas13 | midas13 | noSen_noMt_Py | 0.0172475 |
| midas14 | midas14 | noSen_noMt_Py | 0.0048025 |
| midas15 | midas15 | noSen_noMt_Py | 0.0064073 |
| midas16 | midas16 | noSen_noMt_Py | 0.0086627 |
| midas17 | midas17 | noSen_Mt_noPy | 0.0327732 |
| midas18 | midas18 | noSen_Mt_noPy | 0.0305164 |
| midas19 | midas19 | noSen_Mt_noPy | 0.0238873 |
| midas20 | midas20 | noSen_Mt_noPy | 0.0119504 |
| midas21 | midas21 | noSen_Mt_noPy | 0.0301347 |
| midas22 | midas22 | noSen_Mt_noPy | 0.0181991 |
| midas23 | midas23 | noSen_Mt_Py | 0.0225714 |
| midas24 | midas24 | noSen_Mt_Py | 0.0197663 |
#After removing failed probes, estimate probe missing rate
sum(miss_probe == 0) #692,879 probes with no missings
## [1] 692879
sum(miss_probe > 0) #51,114 probes with missings in at least one sample
## [1] 51114
miss_probe_f <- cut(miss_probe, breaks = seq(0, 1, 0.1))
table(miss_probe_f)
## miss_probe_f
## (0,0.1] (0.1,0.2] (0.2,0.3] (0.3,0.4] (0.4,0.5] (0.5,0.6] (0.6,0.7] (0.7,0.8] (0.8,0.9] (0.9,1]
## 16557 6662 5351 3599 3875 2704 2740 2910 3672 3044
data.frame(missing_interval = miss_probe_f) %>%
filter(!is.na(missing_interval)) %>%
ggplot(aes(missing_interval)) +
geom_bar()
Summary of sample IDs and descriptions
p1 <- pData(eset_Mvals_clean)
p1 %>%
dplyr::select(sampleID, Description) %>%
kable()
| sampleID | Description |
|---|---|
| ATV1 | Non-senescent DMSO-treated control cells |
| ATV2 | Non-senescent DMSO-treated control cells |
| ATV3 | Non-senescent DMSO-treated control cells |
| ATV4 | Non-senescent DMSO-treated control cells |
| ATV5 | Senescent ATV-treated cells |
| ATV6 | Senescent ATV-treated cells |
| ATV7 | Senescent ATV-treated cells |
| ATV8 | Senescent ATV-treated cells |
| midas1 | Non-senescent mtDNA-containing controls with pyruvate |
| midas2 | Non-senescent mtDNA-containing controls with pyruvate |
| midas3 | Non-senescent mtDNA-containing controls with pyruvate |
| midas4 | Non-senescent mtDNA-containing controls with pyruvate |
| midas5 | Senescent mtDNA-depleted cells no pyruvate |
| midas6 | Senescent mtDNA-depleted cells no pyruvate |
| midas7 | Senescent mtDNA-depleted cells no pyruvate |
| midas8 | Senescent mtDNA-depleted cells no pyruvate |
| midas9 | Senescent mtDNA-depleted cells no pyruvate |
| midas10 | Senescent mtDNA-depleted cells no pyruvate |
| midas11 | Non-senescent mtDNA-depleted controls cells with pyruvate |
| midas12 | Non-senescent mtDNA-depleted controls cells with pyruvate |
| midas13 | Non-senescent mtDNA-depleted controls cells with pyruvate |
| midas14 | Non-senescent mtDNA-depleted controls cells with pyruvate |
| midas15 | Non-senescent mtDNA-depleted controls cells with pyruvate |
| midas16 | Non-senescent mtDNA-depleted controls cells with pyruvate |
| midas17 | Non-senescent mtDNA-containing controls no pyruvate |
| midas18 | Non-senescent mtDNA-containing controls no pyruvate |
| midas19 | Non-senescent mtDNA-containing controls no pyruvate |
| midas20 | Non-senescent mtDNA-containing controls no pyruvate |
| midas21 | Non-senescent mtDNA-containing controls no pyruvate |
| midas22 | Non-senescent mtDNA-containing controls no pyruvate |
| midas23 | Non-senescent mtDNA-containing controls with pyruvate |
| midas24 | Non-senescent mtDNA-containing controls with pyruvate |
boxplot(exprs(eset_Mvals_clean), las = 2, ylab = "M-values")
abline(h = median(exprs(eset_Mvals_clean)), col = "blue")
plotDensities(eset_betas_clean, main = "Betas", legend = FALSE)
plotDensities(eset_Mvals_clean, main = "M values", legend = FALSE)
p1 <- pData(eset_Mvals_clean)
p1 %>%
dplyr::select(sampleID, SampleLabel, Description, treatment) %>%
kable()
| sampleID | SampleLabel | Description | treatment |
|---|---|---|---|
| ATV1 | - ATV | Non-senescent DMSO-treated control cells | noSen_ATV |
| ATV2 | - ATV | Non-senescent DMSO-treated control cells | noSen_ATV |
| ATV3 | - ATV | Non-senescent DMSO-treated control cells | noSen_ATV |
| ATV4 | - ATV | Non-senescent DMSO-treated control cells | noSen_ATV |
| ATV5 | + ATV | Senescent ATV-treated cells | sen_ATV |
| ATV6 | + ATV | Senescent ATV-treated cells | sen_ATV |
| ATV7 | + ATV | Senescent ATV-treated cells | sen_ATV |
| ATV8 | + ATV | Senescent ATV-treated cells | sen_ATV |
| midas1 | C +P 3 | Non-senescent mtDNA-containing controls with pyruvate | noSen_Mt_Py |
| midas2 | C +P 4 | Non-senescent mtDNA-containing controls with pyruvate | noSen_Mt_Py |
| midas3 | C +P 5 | Non-senescent mtDNA-containing controls with pyruvate | noSen_Mt_Py |
| midas4 | C +P 6 | Non-senescent mtDNA-containing controls with pyruvate | noSen_Mt_Py |
| midas5 | mtD -P 1 | Senescent mtDNA-depleted cells no pyruvate | sen_noMt_noPy |
| midas6 | mtD -P 2 | Senescent mtDNA-depleted cells no pyruvate | sen_noMt_noPy |
| midas7 | mtD -P 3 | Senescent mtDNA-depleted cells no pyruvate | sen_noMt_noPy |
| midas8 | mtD -P 4 | Senescent mtDNA-depleted cells no pyruvate | sen_noMt_noPy |
| midas9 | mtD -P 5 | Senescent mtDNA-depleted cells no pyruvate | sen_noMt_noPy |
| midas10 | mtD -P 6 | Senescent mtDNA-depleted cells no pyruvate | sen_noMt_noPy |
| midas11 | mtD +P 1 | Non-senescent mtDNA-depleted controls cells with pyruvate | noSen_noMt_Py |
| midas12 | mtD +P 2 | Non-senescent mtDNA-depleted controls cells with pyruvate | noSen_noMt_Py |
| midas13 | mtD +P 3 | Non-senescent mtDNA-depleted controls cells with pyruvate | noSen_noMt_Py |
| midas14 | mtD +P 4 | Non-senescent mtDNA-depleted controls cells with pyruvate | noSen_noMt_Py |
| midas15 | mtD +P 5 | Non-senescent mtDNA-depleted controls cells with pyruvate | noSen_noMt_Py |
| midas16 | mtD +P 6 | Non-senescent mtDNA-depleted controls cells with pyruvate | noSen_noMt_Py |
| midas17 | C -P 1 | Non-senescent mtDNA-containing controls no pyruvate | noSen_Mt_noPy |
| midas18 | C -P 2 | Non-senescent mtDNA-containing controls no pyruvate | noSen_Mt_noPy |
| midas19 | C -P 3 | Non-senescent mtDNA-containing controls no pyruvate | noSen_Mt_noPy |
| midas20 | C -P 4 | Non-senescent mtDNA-containing controls no pyruvate | noSen_Mt_noPy |
| midas21 | C -P 5 | Non-senescent mtDNA-containing controls no pyruvate | noSen_Mt_noPy |
| midas22 | C -P 6 | Non-senescent mtDNA-containing controls no pyruvate | noSen_Mt_noPy |
| midas23 | C +P 1 | Non-senescent mtDNA-containing controls with pyruvate | noSen_Mt_Py |
| midas24 | C +P 2 | Non-senescent mtDNA-containing controls with pyruvate | noSen_Mt_Py |
eset_Mvals_sub <- eset_Mvals_clean[,which(p1$treatment == "senATV" | p1$treatment == "noSen_ATV")]
plotDensities(eset_Mvals_sub, main = "M values ATV", legend = TRUE)
eset_Mvals_sub <- eset_Mvals_clean[,which(p1$treatment == "sen_noMt_noPy" | p1$treatment == "noSen_noMt_Py" | p1$treatment == "noSen_Mt_noPy" | p1$treatment == "noSen_Mt_Py")]
plotDensities(eset_Mvals_sub, main = "M values MiDAS sen and controls", legend = TRUE)
eset_Mvals_sub <- eset_Mvals_clean[,which(p1$treatment == "sen_noMt_noPy")]
plotDensities(eset_Mvals_sub, main = "M values MiDAS sen", legend = TRUE)
eset_Mvals_sub <- eset_Mvals_clean[,which(p1$treatment == "noSen_noMt_Py" | p1$treatment == "noSen_Mt_noPy" | p1$treatment == "noSen_Mt_Py")]
plotDensities(eset_Mvals_sub, main = "M values MiDAS controls", legend = TRUE)
eset_Mvals_sub <- eset_Mvals_clean[,which(p1$treatment == "noSen_noMt_Py")]
plotDensities(eset_Mvals_sub, main = "M values MiDAS controls no MT Py", legend = TRUE)
plot_MDS <- function(mylabel, myeset){
group <- pData(myeset)[[mylabel]]
col.group <- group
levels(col.group) <- brewer.pal(nlevels(col.group), "Dark2")
col.group.ch <- as.character(col.group)
plotMDS(myeset, col = col.group.ch, gene.selection = "common", main = paste(mylabel, "labels"), pch = 16)
legend("topleft", fill = levels(col.group), legend = levels(group))
}
plot_MDS(mylabel = "treatment", myeset = eset_betas_clean)
plot_MDS(mylabel = "treatment", myeset = eset_Mvals_clean)
plot_MDS <- function(mylabel, myeset){
group <- pData(myeset)[[mylabel]]
col.group <- group
levels(col.group) <- brewer.pal(nlevels(col.group), "Set1")
col.group.ch <- as.character(col.group)
plotMDS(myeset, col = col.group.ch, gene.selection = "common", main = paste(mylabel, "labels"), pch = 16)
legend("top", fill = levels(col.group), legend = levels(group))
}
keep <- str_detect(pData(eset_Mvals_clean)$sampleID, "ATV")
eset_sub <- eset_Mvals_clean[, keep]
plot_MDS(mylabel = "treatment", myeset = eset_sub)
myMDS <- plotMDS(eset_Mvals_clean, gene.selection = "common", plot = FALSE)
cluster_top <- myMDS$y[myMDS$y > 0.2 ]
cluster_bottom <- myMDS$x[myMDS$x > 1 ]
cluster_ATV <- myMDS$x[myMDS$x < (-2) ]
#cluster top
pData(eset_Mvals_clean) %>%
filter(sampleID %in% names(cluster_top)) %>%
dplyr::select(sampleID, SampleLabel, treatment, Description) %>%
kable
| sampleID | SampleLabel | treatment | Description |
|---|---|---|---|
| midas1 | C +P 3 | noSen_Mt_Py | Non-senescent mtDNA-containing controls with pyruvate |
| midas2 | C +P 4 | noSen_Mt_Py | Non-senescent mtDNA-containing controls with pyruvate |
| midas5 | mtD -P 1 | sen_noMt_noPy | Senescent mtDNA-depleted cells no pyruvate |
| midas6 | mtD -P 2 | sen_noMt_noPy | Senescent mtDNA-depleted cells no pyruvate |
| midas9 | mtD -P 5 | sen_noMt_noPy | Senescent mtDNA-depleted cells no pyruvate |
| midas10 | mtD -P 6 | sen_noMt_noPy | Senescent mtDNA-depleted cells no pyruvate |
| midas13 | mtD +P 3 | noSen_noMt_Py | Non-senescent mtDNA-depleted controls cells with pyruvate |
| midas14 | mtD +P 4 | noSen_noMt_Py | Non-senescent mtDNA-depleted controls cells with pyruvate |
| midas17 | C -P 1 | noSen_Mt_noPy | Non-senescent mtDNA-containing controls no pyruvate |
| midas18 | C -P 2 | noSen_Mt_noPy | Non-senescent mtDNA-containing controls no pyruvate |
| midas21 | C -P 5 | noSen_Mt_noPy | Non-senescent mtDNA-containing controls no pyruvate |
| midas22 | C -P 6 | noSen_Mt_noPy | Non-senescent mtDNA-containing controls no pyruvate |
#cluster bottom
pData(eset_Mvals_clean) %>%
filter(sampleID %in% names(cluster_bottom)) %>%
dplyr::select(sampleID, SampleLabel, treatment, Description) %>%
kable
| sampleID | SampleLabel | treatment | Description |
|---|---|---|---|
| midas3 | C +P 5 | noSen_Mt_Py | Non-senescent mtDNA-containing controls with pyruvate |
| midas4 | C +P 6 | noSen_Mt_Py | Non-senescent mtDNA-containing controls with pyruvate |
| midas7 | mtD -P 3 | sen_noMt_noPy | Senescent mtDNA-depleted cells no pyruvate |
| midas8 | mtD -P 4 | sen_noMt_noPy | Senescent mtDNA-depleted cells no pyruvate |
| midas11 | mtD +P 1 | noSen_noMt_Py | Non-senescent mtDNA-depleted controls cells with pyruvate |
| midas12 | mtD +P 2 | noSen_noMt_Py | Non-senescent mtDNA-depleted controls cells with pyruvate |
| midas15 | mtD +P 5 | noSen_noMt_Py | Non-senescent mtDNA-depleted controls cells with pyruvate |
| midas16 | mtD +P 6 | noSen_noMt_Py | Non-senescent mtDNA-depleted controls cells with pyruvate |
| midas19 | C -P 3 | noSen_Mt_noPy | Non-senescent mtDNA-containing controls no pyruvate |
| midas20 | C -P 4 | noSen_Mt_noPy | Non-senescent mtDNA-containing controls no pyruvate |
| midas23 | C +P 1 | noSen_Mt_Py | Non-senescent mtDNA-containing controls with pyruvate |
| midas24 | C +P 2 | noSen_Mt_Py | Non-senescent mtDNA-containing controls with pyruvate |
#cluster ATV
pData(eset_Mvals_clean) %>%
filter(sampleID %in% names(cluster_ATV)) %>%
dplyr::select(sampleID, SampleLabel, treatment, Description) %>%
kable
| sampleID | SampleLabel | treatment | Description |
|---|---|---|---|
| ATV1 | - ATV | noSen_ATV | Non-senescent DMSO-treated control cells |
| ATV2 | - ATV | noSen_ATV | Non-senescent DMSO-treated control cells |
| ATV3 | - ATV | noSen_ATV | Non-senescent DMSO-treated control cells |
| ATV4 | - ATV | noSen_ATV | Non-senescent DMSO-treated control cells |
| ATV5 | + ATV | sen_ATV | Senescent ATV-treated cells |
| ATV6 | + ATV | sen_ATV | Senescent ATV-treated cells |
| ATV7 | + ATV | sen_ATV | Senescent ATV-treated cells |
| ATV8 | + ATV | sen_ATV | Senescent ATV-treated cells |
plotDensities(eset_Mvals_clean, main = "M values all samples", legend = FALSE)
p1 <- pData(eset_Mvals_clean)
p1 %>%
dplyr::select(sampleID, SampleLabel, Description, treatment) %>%
kable()
| sampleID | SampleLabel | Description | treatment |
|---|---|---|---|
| ATV1 | - ATV | Non-senescent DMSO-treated control cells | noSen_ATV |
| ATV2 | - ATV | Non-senescent DMSO-treated control cells | noSen_ATV |
| ATV3 | - ATV | Non-senescent DMSO-treated control cells | noSen_ATV |
| ATV4 | - ATV | Non-senescent DMSO-treated control cells | noSen_ATV |
| ATV5 | + ATV | Senescent ATV-treated cells | sen_ATV |
| ATV6 | + ATV | Senescent ATV-treated cells | sen_ATV |
| ATV7 | + ATV | Senescent ATV-treated cells | sen_ATV |
| ATV8 | + ATV | Senescent ATV-treated cells | sen_ATV |
| midas1 | C +P 3 | Non-senescent mtDNA-containing controls with pyruvate | noSen_Mt_Py |
| midas2 | C +P 4 | Non-senescent mtDNA-containing controls with pyruvate | noSen_Mt_Py |
| midas3 | C +P 5 | Non-senescent mtDNA-containing controls with pyruvate | noSen_Mt_Py |
| midas4 | C +P 6 | Non-senescent mtDNA-containing controls with pyruvate | noSen_Mt_Py |
| midas5 | mtD -P 1 | Senescent mtDNA-depleted cells no pyruvate | sen_noMt_noPy |
| midas6 | mtD -P 2 | Senescent mtDNA-depleted cells no pyruvate | sen_noMt_noPy |
| midas7 | mtD -P 3 | Senescent mtDNA-depleted cells no pyruvate | sen_noMt_noPy |
| midas8 | mtD -P 4 | Senescent mtDNA-depleted cells no pyruvate | sen_noMt_noPy |
| midas9 | mtD -P 5 | Senescent mtDNA-depleted cells no pyruvate | sen_noMt_noPy |
| midas10 | mtD -P 6 | Senescent mtDNA-depleted cells no pyruvate | sen_noMt_noPy |
| midas11 | mtD +P 1 | Non-senescent mtDNA-depleted controls cells with pyruvate | noSen_noMt_Py |
| midas12 | mtD +P 2 | Non-senescent mtDNA-depleted controls cells with pyruvate | noSen_noMt_Py |
| midas13 | mtD +P 3 | Non-senescent mtDNA-depleted controls cells with pyruvate | noSen_noMt_Py |
| midas14 | mtD +P 4 | Non-senescent mtDNA-depleted controls cells with pyruvate | noSen_noMt_Py |
| midas15 | mtD +P 5 | Non-senescent mtDNA-depleted controls cells with pyruvate | noSen_noMt_Py |
| midas16 | mtD +P 6 | Non-senescent mtDNA-depleted controls cells with pyruvate | noSen_noMt_Py |
| midas17 | C -P 1 | Non-senescent mtDNA-containing controls no pyruvate | noSen_Mt_noPy |
| midas18 | C -P 2 | Non-senescent mtDNA-containing controls no pyruvate | noSen_Mt_noPy |
| midas19 | C -P 3 | Non-senescent mtDNA-containing controls no pyruvate | noSen_Mt_noPy |
| midas20 | C -P 4 | Non-senescent mtDNA-containing controls no pyruvate | noSen_Mt_noPy |
| midas21 | C -P 5 | Non-senescent mtDNA-containing controls no pyruvate | noSen_Mt_noPy |
| midas22 | C -P 6 | Non-senescent mtDNA-containing controls no pyruvate | noSen_Mt_noPy |
| midas23 | C +P 1 | Non-senescent mtDNA-containing controls with pyruvate | noSen_Mt_Py |
| midas24 | C +P 2 | Non-senescent mtDNA-containing controls with pyruvate | noSen_Mt_Py |
eset_Mvals_sub <- eset_Mvals_clean[, p1$sampleID %in% names(cluster_top)]
plotDensities(eset_Mvals_sub, main = "M values cluster top", legend = TRUE)
eset_Mvals_sub <- eset_Mvals_clean[, p1$sampleID %in% names(cluster_bottom)]
plotDensities(eset_Mvals_sub, main = "M values cluster bottom", legend = TRUE)
eset_Mvals_sub <- eset_Mvals_clean[, p1$sampleID %in% names(cluster_ATV)]
plotDensities(eset_Mvals_sub, main = "M values cluster ATV", legend = TRUE)
Group means parameterization. No intercept.
\[ Y = B_1X_1 + B_2X_2 + B_3X_3 + B_4X_4 + B_5X_5 + B_6X_6 + \epsilon \]
\(B_1\) = mean expression level in noSen_ATV.
\(B_2\) = mean expression level in noSen_Mt_Py.
\(B_3\) = mean expression level in noSen_Mt_noPy.
\(B_4\) = mean expression level in noSen_noMt_Py.
\(B_5\) = mean expression level in sen_ATV.
\(B_6\) = mean expression level in sen_noMt_noPy.
After model fit, contrasts are set that represent difference in group means.
logFC = estimate of the log2-fold-change
CI.L = lower 95% confidence interval for logFC
CI.R = upper 95% confidence interval for logFC
AveExpr = average log2-expression for the probe over all samples
t = moderated t-statistic
P.value = unadjusted P-value
adj.P.value = BH 1995 adjusted P-value
B = log-odds that the variable is differentially abundant.
From the limma user guide: Suppose for example that B = 1.5. The odds of differential expression is exp(1.5)=4.48, i.e, about four and a half to one. The probability that the gene is differentially expressed is 4.48/(1+4.48)=0.82, i.e., the probability is about 82% that this gene is differentially expressed. A B-statistic of zero corresponds to a 50-50 chance that the gene is differentially expressed. The B-statistic is automatically adjusted for multiple testing by assuming that 1% of the genes are expected to be differentially expressed. The p-values and B-statistics will normally rank genes in the same order.
design <- model.matrix(~0 + treatment, data = pData(eset_Mvals_clean))
design
## treatmentnoSen_ATV treatmentnoSen_Mt_Py treatmentnoSen_Mt_noPy treatmentnoSen_noMt_Py treatmentsen_ATV treatmentsen_noMt_noPy
## ATV1 1 0 0 0 0 0
## ATV2 1 0 0 0 0 0
## ATV3 1 0 0 0 0 0
## ATV4 1 0 0 0 0 0
## ATV5 0 0 0 0 1 0
## ATV6 0 0 0 0 1 0
## ATV7 0 0 0 0 1 0
## ATV8 0 0 0 0 1 0
## midas1 0 1 0 0 0 0
## midas2 0 1 0 0 0 0
## midas3 0 1 0 0 0 0
## midas4 0 1 0 0 0 0
## midas5 0 0 0 0 0 1
## midas6 0 0 0 0 0 1
## midas7 0 0 0 0 0 1
## midas8 0 0 0 0 0 1
## midas9 0 0 0 0 0 1
## midas10 0 0 0 0 0 1
## midas11 0 0 0 1 0 0
## midas12 0 0 0 1 0 0
## midas13 0 0 0 1 0 0
## midas14 0 0 0 1 0 0
## midas15 0 0 0 1 0 0
## midas16 0 0 0 1 0 0
## midas17 0 0 1 0 0 0
## midas18 0 0 1 0 0 0
## midas19 0 0 1 0 0 0
## midas20 0 0 1 0 0 0
## midas21 0 0 1 0 0 0
## midas22 0 0 1 0 0 0
## midas23 0 1 0 0 0 0
## midas24 0 1 0 0 0 0
## attr(,"assign")
## [1] 1 1 1 1 1 1
## attr(,"contrasts")
## attr(,"contrasts")$treatment
## [1] "contr.treatment"
colSums(design)
## treatmentnoSen_ATV treatmentnoSen_Mt_Py treatmentnoSen_Mt_noPy treatmentnoSen_noMt_Py treatmentsen_ATV treatmentsen_noMt_noPy
## 4 6 6 6 4 6
cm <- makeContrasts(ATVvControl = treatmentsen_ATV - treatmentnoSen_ATV,
SenMidasvNoSenMtnoPy = treatmentsen_noMt_noPy - treatmentnoSen_Mt_noPy,
SenMidasvNoSenNoMtPy = treatmentsen_noMt_noPy - treatmentnoSen_noMt_Py,
levels = design
)
fit <- lmFit(eset_Mvals_clean, design)
## Warning: Partial NA coefficients for 19911 probe(s)
fit2 <- contrasts.fit(fit, contrasts = cm)
fit2 <- eBayes(fit2)
results <- decideTests(fit2)
summary(results)
## ATVvControl SenMidasvNoSenMtnoPy SenMidasvNoSenNoMtPy
## Down 1 0 25109
## NotSig 725639 734633 705171
## Up 1 0 5949
vennDiagram(results)
volcanoplot(fit2, coef = "ATVvControl" , main = "ATV vs control")
volcanoplot(fit2, coef = "SenMidasvNoSenMtnoPy", main = "MiDAS vs control with MtDNA")
volcanoplot(fit2, coef = "SenMidasvNoSenNoMtPy", main = "MiDAS vs control without mtDNA")
annot_cols <- c("CpG_chrm", "CpG_beg", "probeID", "gene")
topTable(fit2, coef = "ATVvControl", genelist = fit2$genes[,annot_cols], p.value = 0.05, confint = TRUE) %>%
kable()
| CpG_chrm | CpG_beg | probeID | gene | logFC | CI.L | CI.R | AveExpr | t | P.Value | adj.P.Val | B | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| cg05454501 | chr5 | 76373390 | cg05454501 | ZBED3 | -0.7813445 | -1.239024 | -0.3236651 | -5.109639 | -7.087593 | 1e-07 | 0.0253733 | 3.682465 |
| cg25892587 | chr10 | 3666168 | cg25892587 | NA | 4.4394857 | 4.179659 | 4.6993119 | -1.980614 | 15.395070 | 0e+00 | 0.0020826 | -1.136984 |
topTable(fit2, coef = "SenMidasvNoSenMtnoPy", genelist = fit2$genes[,annot_cols], p.value = 0.05, confint = TRUE) %>%
kable()
## Warning in kable_pipe(x = structure(character(0), .Dim = c(0L, 0L), .Dimnames = list(: The table should have a header (column names)
|| || || ||
topTable(fit2, coef = "SenMidasvNoSenNoMtPy", genelist = fit2$genes[,annot_cols], p.value = 0.05, confint = TRUE) %>%
kable()
| CpG_chrm | CpG_beg | probeID | gene | logFC | CI.L | CI.R | AveExpr | t | P.Value | adj.P.Val | B | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| cg00675940 | chr5 | 42923549 | cg00675940 | NA | -1.4071519 | -1.5738946 | -1.2404091 | 1.3693795 | -8.245859 | 0 | 0.0013562 | 10.494661 |
| cg21545248 | chr5 | 149402497 | cg21545248 | HMGXB3;RPS20P4 | -0.7937433 | -1.1107432 | -0.4767434 | 0.5253161 | -8.205613 | 0 | 0.0013562 | 10.406157 |
| cg21189727 | chr8 | 67940949 | cg21189727 | PPP1R42 | -0.7814580 | -1.0559005 | -0.5070156 | -3.9713074 | -7.872901 | 0 | 0.0021366 | 9.664540 |
| cg02780400 | chr10 | 43187468 | cg02780400 | LINC01518 | -0.6788858 | -0.8948734 | -0.4628981 | 0.9014428 | -7.648428 | 0 | 0.0025989 | 9.154314 |
| cg26591221 | chr16 | 88942334 | cg26591221 | CBFA2T3 | -0.9453924 | -1.1290296 | -0.7617553 | 4.2901905 | -7.590883 | 0 | 0.0025989 | 9.022258 |
| cg00329052 | chr4 | 169935000 | cg00329052 | NA | -1.4317712 | -1.7298707 | -1.1336717 | 1.4300259 | -7.471123 | 0 | 0.0025989 | 8.745811 |
| cg09107055 | chr16 | 46854829 | cg09107055 | C16orf87 | -1.0151181 | -1.4286880 | -0.6015482 | -0.3137258 | -7.451222 | 0 | 0.0025989 | 8.699662 |
| cg00402311 | chr8 | 62282814 | cg00402311 | CLVS1 | -0.8368593 | -1.0099842 | -0.6637343 | 0.1290056 | -7.426037 | 0 | 0.0025989 | 8.641177 |
| cg03338185 | chr1 | 16304820 | cg03338185 | NA | -1.1278547 | -1.3348646 | -0.9208447 | 1.0472024 | -7.325484 | 0 | 0.0026091 | 8.406727 |
| cg04398932 | chr12 | 133656737 | cg04398932 | ZNF140 | 0.8484672 | 0.6160453 | 1.0808890 | -2.8908636 | 7.398473 | 0 | 0.0026091 | 8.389052 |
#save results
resultAll <- topTable(fit2, coef = "ATVvControl", sort.by = "none", number = nrow(fit2), confint = TRUE)
write_csv(resultAll, path = paste0(dir_out, "ATVvControl.csv"))
## Warning: The `path` argument of `write_csv()` is deprecated as of readr 1.4.0.
## Please use the `file` argument instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_warnings()` to see where this warning was generated.
resultAll <- topTable(fit2, coef = "SenMidasvNoSenMtnoPy", sort.by = "none", number = nrow(fit2), confint = TRUE)
write_csv(resultAll, path = paste0(dir_out, "SenMidasvNoSenMtnoPy.csv"))
resultAll <- topTable(fit2, coef = "SenMidasvNoSenNoMtPy", sort.by = "none", number = nrow(fit2), confint = TRUE)
write_csv(resultAll, path = paste0(dir_out, "SenMidasvNoSenNoMtPy.csv"))
Array annotation downloaded here.
Old results are my quick t-tests performed on Nate’s Noob-normalized data using minfi. I saw lots of significant ATV associations in old results, but not from the sesame-based results. What are the differences?
There are many differences between old and new ATV results. Differences include:
Preprocessing. Old results from minfi noob preprocessing. New results from sesame noob and pOOBAH.
Methylation values. Old results calculated from Betas. New results calculated from M-values.
Association testing. Old results calculated from t-test. New results calculated from moderated t-tests using limma. Limma should have more power, and should be more robust because variance won’t be influenced by probes with high variance.
Results below show there are many differences between old results and new Sesame with limma.
fdat <- read_tsv("~/bigdata/EWAStools/arrayAnnotation/EPIC.hg19.manifest.tsv")
dat <- read_csv(paste0(dir_out, "ATVvControl.csv"))
datOld <- read_csv(paste0(dir_out, "probewise_ATV.csv"))
datOld %>%
filter(P_BH <= 0.05) %>%
left_join(fdat, by = "probeID") %>%
summarize(n_masked = sum(MASK_general), n_not_masked = sum(!MASK_general))
## # A tibble: 1 x 2
## n_masked n_not_masked
## <int> <int>
## 1 13 167
names(datOld)[2:4] <- paste("v1", names(datOld)[2:4], sep = "_")
datOld %>%
filter(v1_P_BH <= 0.05) %>%
left_join(dat, by = "probeID") %>%
select(probeID, v1_T_STAT, v1_P, v1_P_BH, t, P.Value, adj.P.Val, everything()) %>%
arrange(v1_P) %>%
slice_head(n = 10)
## # A tibble: 10 x 68
## probeID v1_T_STAT v1_P v1_P_BH t P.Value adj.P.Val CpG_chrm CpG_beg CpG_end probe_strand address_A address_B channel designType nextBase nextBaseRef probeType orientation probeCpGcnt context35 probeBeg
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <dbl> <dbl> <chr> <dbl> <dbl> <chr> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
## 1 cg1625~ 37.7 2.35e-8 0.0166 0.207 0.838 1.00 chr6 4.71e7 4.71e7 + 31663185 NA Both II G/A C cg up 1 1 4.71e7
## 2 cg1352~ 36.8 4.74e-8 0.0166 0.00730 0.994 1.00 chr12 6.50e7 6.50e7 - 97785272 NA Both II G/A C cg down 1 2 6.50e7
## 3 cg0258~ 35.1 5.75e-8 0.0166 0.130 0.898 1.00 chr13 3.42e7 3.42e7 + 22661284 NA Both II G/A C cg up 1 1 3.42e7
## 4 cg1602~ 28.7 1.38e-7 0.0226 0.486 0.632 1.00 chr2 4.37e7 4.37e7 - 36620188 NA Both II G/A C cg down 1 2 4.37e7
## 5 cg1397~ 36.9 1.50e-7 0.0226 0.00566 0.996 1.00 chr1 1.63e8 1.63e8 + 80673585 NA Both II G/A C cg up 1 1 1.63e8
## 6 cg1679~ 26.5 1.96e-7 0.0226 -0.0488 0.961 1.00 chr10 7.19e7 7.19e7 - 55810162 NA Both II G/A C cg down 1 2 7.19e7
## 7 cg2306~ 26.8 2.06e-7 0.0226 NA NA NA <NA> NA NA <NA> NA NA <NA> <NA> <NA> <NA> <NA> <NA> NA NA NA
## 8 cg1669~ 30.8 2.12e-7 0.0226 0.222 0.825 1.00 chr3 7.11e7 7.11e7 - 73768831 NA Both II G/A C cg down 0 1 7.11e7
## 9 cg1783~ 36.7 2.41e-7 0.0226 -0.0778 0.938 1.00 chr17 1.98e7 1.98e7 + 16806277 NA Both II G/A C cg up 1 1 1.98e7
## 10 cg1469~ 33.4 2.62e-7 0.0226 -0.204 0.839 1.00 chr12 9.25e7 9.25e7 + 88600417 NA Both II G/A C cg up 1 1 9.25e7
## # ... with 46 more variables: probeEnd <dbl>, ProbeSeq_A <chr>, ProbeSeq_B <chr>, gene <chr>, gene_HGNC <chr>, chrm_A <chr>, beg_A <dbl>, flag_A <dbl>, mapQ_A <dbl>, cigar_A <chr>, NM_A <dbl>, chrm_B <chr>, beg_B <dbl>,
## # flag_B <dbl>, mapQ_B <dbl>, cigar_B <chr>, NM_B <dbl>, wDecoy_chrm_A <chr>, wDecoy_beg_A <dbl>, wDecoy_flag_A <dbl>, wDecoy_mapQ_A <dbl>, wDecoy_cigar_A <chr>, wDecoy_NM_A <dbl>, wDecoy_chrm_B <chr>,
## # wDecoy_beg_B <dbl>, wDecoy_flag_B <dbl>, wDecoy_mapQ_B <dbl>, wDecoy_cigar_B <chr>, wDecoy_NM_B <dbl>, posMatch <lgl>, MASK_mapping <lgl>, MASK_typeINextBaseSwitch <lgl>, MASK_rmsk15 <lgl>, MASK_sub40_copy <lgl>,
## # MASK_sub35_copy <lgl>, MASK_sub30_copy <lgl>, MASK_sub25_copy <lgl>, MASK_snp5_common <lgl>, MASK_snp5_GMAF1p <lgl>, MASK_extBase <lgl>, MASK_general <lgl>, logFC <dbl>, CI.L <dbl>, CI.R <dbl>, AveExpr <dbl>,
## # B <dbl>
datMerge <- inner_join(datOld, dat, by = "probeID")
cor1 <- cor(datMerge$v1_T_STAT, datMerge$t, use = "complete.obs")
if(cor1 < 0.1) cor1 <- 0.1
datOld %>%
inner_join(dat, by = "probeID") %>%
filter(!is.na(v1_T_STAT)) %>%
filter(!is.na(t)) %>%
ggplot(aes(x = v1_T_STAT, y = t)) +
geom_hex(binwidth = c(cor1, cor1)) +
labs(x = "old results (minfi noob t-test)",
y = "new results",
caption = "Old results are from Nate minfi noob data with Dan t-test."
)
# Minfi processing
Since the sesame-based results are so different from the noob-normalized minfi results, let me try minfi noob myself, with some additional QC steps. Besides, the minfi noob results had many significant associations.
This code chunk creates a noob processed dataset from the raw data. Should be the same as data from Nate.
Output is a matrix of betas. Then, apply unpaired t-test and compare to my previous results. Results should replicate, as long as my minfi noob preprocess was the same as Nate’s.
Summary methods: Minfi noob Beta values Unpaired t-test
If I can replicate results, then I start adding different layers to see if significant associations can be retained while still performing robust QC.
baseDir <- "../data/raw/idatFiles/"
targets <- read.csv(paste0(baseDir, "SampleSheet.csv"))
targets$BasenameOriginal <- as.character(targets$Basename)
targets$Slide <- substr(targets$Basename, 1, 12)
targets$Basename <- paste0(baseDir, targets$Slide, "/", targets$BasenameOriginal)
rgSet <- read.metharray.exp(base = NULL, targets = targets, recursive = TRUE, force = TRUE)
#must load manifest to perform preprocessNoob
Mset <- preprocessNoob(rgSet)
gset <- mapToGenome(Mset)
grset <- ratioConvert(gset, what = "both")
beta_n <- getBeta(grset)
ewas <- data.frame(ID = row.names(beta_n), beta_n)
names(ewas) <- str_replace_all(names(ewas), pattern = "^X", replacement = "")
names(ewas)[1] <- "probeID"
#Targets and sample IDs in same order
sum(names(ewas)[-1] != targets$BasenameOriginal)
## [1] 0
cbind(names(ewas)[-1], targets$BasenameOriginal)
## [,1] [,2]
## [1,] "203839170002_R01C01" "203839170002_R01C01"
## [2,] "203839170002_R02C01" "203839170002_R02C01"
## [3,] "203839170002_R03C01" "203839170002_R03C01"
## [4,] "203839170002_R04C01" "203839170002_R04C01"
## [5,] "203839170002_R05C01" "203839170002_R05C01"
## [6,] "203839170002_R06C01" "203839170002_R06C01"
## [7,] "203839170002_R07C01" "203839170002_R07C01"
## [8,] "203839170002_R08C01" "203839170002_R08C01"
## [9,] "203836210101_R01C01" "203836210101_R01C01"
## [10,] "203836210101_R02C01" "203836210101_R02C01"
## [11,] "203836210101_R03C01" "203836210101_R03C01"
## [12,] "203836210101_R04C01" "203836210101_R04C01"
## [13,] "203836210101_R05C01" "203836210101_R05C01"
## [14,] "203836210101_R06C01" "203836210101_R06C01"
## [15,] "203836210101_R07C01" "203836210101_R07C01"
## [16,] "203836210101_R08C01" "203836210101_R08C01"
## [17,] "203839170001_R01C01" "203839170001_R01C01"
## [18,] "203839170001_R02C01" "203839170001_R02C01"
## [19,] "203839170001_R03C01" "203839170001_R03C01"
## [20,] "203839170001_R04C01" "203839170001_R04C01"
## [21,] "203839170001_R05C01" "203839170001_R05C01"
## [22,] "203839170001_R06C01" "203839170001_R06C01"
## [23,] "203839170001_R07C01" "203839170001_R07C01"
## [24,] "203839170001_R08C01" "203839170001_R08C01"
## [25,] "203833040074_R01C01" "203833040074_R01C01"
## [26,] "203833040074_R02C01" "203833040074_R02C01"
## [27,] "203833040074_R03C01" "203833040074_R03C01"
## [28,] "203833040074_R04C01" "203833040074_R04C01"
## [29,] "203833040074_R05C01" "203833040074_R05C01"
## [30,] "203833040074_R06C01" "203833040074_R06C01"
## [31,] "203833040074_R07C01" "203833040074_R07C01"
## [32,] "203833040074_R08C01" "203833040074_R08C01"
#create new variables in sample sheet to indicate contrasts
samp <- targets %>%
mutate(ATV = ifelse(str_detect(ExternalSampleID, "\\+ ATV"), 1L, 0L))
samp$ATV[str_detect(samp$ExternalSampleID, ".*ATV.*", negate = TRUE)] <- NA
cbind(samp$ATV, samp$ExternalSampleID)
## [,1] [,2]
## [1,] NA "C -P 1"
## [2,] NA "C +P 1"
## [3,] NA "mtD -P 1"
## [4,] NA "mtD +P 1"
## [5,] NA "C -P 2"
## [6,] NA "C +P 2"
## [7,] NA "mtD -P 2"
## [8,] NA "mtD +P 2"
## [9,] NA "C -P 3"
## [10,] NA "C +P 3"
## [11,] NA "mtD -P 3"
## [12,] NA "mtD +P 3"
## [13,] NA "C -P 4"
## [14,] NA "C +P 4"
## [15,] NA "mtD -P 4"
## [16,] NA "mtD +P 4"
## [17,] NA "C -P 5"
## [18,] NA "C +P 5"
## [19,] NA "mtD -P 5"
## [20,] NA "mtD +P 5"
## [21,] NA "C -P 6"
## [22,] NA "C +P 6"
## [23,] NA "mtD -P 6"
## [24,] NA "mtD +P 6"
## [25,] "0" " - ATV 1"
## [26,] "1" " + ATV 1"
## [27,] "0" " - ATV 2"
## [28,] "1" " + ATV 2"
## [29,] "0" " - ATV 3"
## [30,] "1" " + ATV 3"
## [31,] "0" " - ATV 4"
## [32,] "1" " + ATV 4"
betaMat <- ewas %>%
dplyr::select(-probeID) %>%
as.matrix
# Association testing
lm_fun <- function(cg){
out <- tryCatch({
lm1 <- t.test(cg ~ ATV, data = samp, na.action = na.omit)
tstat <- lm1$statistic
p <- lm1$p.value
return(c(tstat, p))
}, error = function(e) rep(NA, 2)
)
}
results <- apply(betaMat, 1, lm_fun)
results <- t(results)
colnames(results) <- c("T_STAT", "P")
resultsTB <- as_tibble(results)
resultsTB <- resultsTB %>%
mutate(probeID = row.names(results)) %>%
dplyr::select(probeID, everything())
adjp1 <- mt.rawp2adjp(resultsTB[["P"]], proc = "BH")
resultsTB <- resultsTB %>%
mutate(P_BH = adjp1$adjp[order(adjp1$index), "BH"])
sum(resultsTB$P_BH <= 0.05)
## [1] 180
#Are new results the same as the old results from data from Nate?
datOld <- read_csv("../results/probewise_ATV.csv")
sum(datOld$P_BH <= 0.05)
## [1] 180
names(datOld)[2:4] <- paste("v1", names(datOld)[2:4], sep = "_")
datOld %>%
filter(v1_P_BH <= 0.05) %>%
left_join(resultsTB, by = "probeID") %>%
arrange(v1_P) %>%
slice_head(n = 10) %>%
kable(digits = 9)
| probeID | v1_T_STAT | v1_P | v1_P_BH | T_STAT | P | P_BH |
|---|---|---|---|---|---|---|
| cg16256574 | 37.73952 | 2.40e-08 | 0.01660825 | 37.73952 | 2.40e-08 | 0.01660825 |
| cg13529355 | 36.82122 | 4.70e-08 | 0.01660825 | 36.82122 | 4.70e-08 | 0.01660825 |
| cg02586306 | 35.07179 | 5.80e-08 | 0.01660825 | 35.07179 | 5.80e-08 | 0.01660825 |
| cg16029913 | 28.74487 | 1.38e-07 | 0.02264537 | 28.74487 | 1.38e-07 | 0.02264537 |
| cg13979492 | 36.92012 | 1.50e-07 | 0.02264537 | 36.92012 | 1.50e-07 | 0.02264537 |
| cg16794634 | 26.54316 | 1.96e-07 | 0.02264537 | 26.54316 | 1.96e-07 | 0.02264537 |
| cg23068081 | 26.84196 | 2.06e-07 | 0.02264537 | 26.84196 | 2.06e-07 | 0.02264537 |
| cg16697896 | 30.79678 | 2.12e-07 | 0.02264537 | 30.79678 | 2.12e-07 | 0.02264537 |
| cg17834252 | 36.70854 | 2.41e-07 | 0.02264537 | 36.70854 | 2.41e-07 | 0.02264537 |
| cg14690837 | 33.36396 | 2.62e-07 | 0.02264537 | 33.36396 | 2.62e-07 | 0.02264537 |
datOld %>%
inner_join(resultsTB, by = "probeID") %>%
filter(!is.na(v1_T_STAT)) %>%
filter(!is.na(T_STAT)) %>%
ggplot(aes(x = v1_T_STAT, y = T_STAT)) +
geom_bin2d(binwidth = c(0.8, 0.8)) +
labs(x = "old results (Nate minfi noob t-test)",
y = "new results (Dan minfi noob t-test)",
caption = "Old results are from Nate minfi noob data with Dan t-test. New results are from Dan minfi noob data with Dan t-test."
)
Now that I can replicate my old results, let’s see if simply adding limma on top removes all of the significant findings.
Summary methods: Minfi noob Beta values Limma
baseDir <- "../data/raw/idatFiles/"
targets <- read.csv(paste0(baseDir, "SampleSheet.csv"))
targets$BasenameOriginal <- as.character(targets$Basename)
targets$Slide <- substr(targets$Basename, 1, 12)
targets$Basename <- paste0(baseDir, targets$Slide, "/", targets$BasenameOriginal)
rgSet <- read.metharray.exp(base = NULL, targets = targets, recursive = TRUE, force = TRUE)
#must load manifest to perform preprocessNoob
Mset <- preprocessNoob(rgSet)
gset <- mapToGenome(Mset)
grset <- ratioConvert(gset, what = "both")
bVals <- getBeta(grset)
#Targets and sample IDs in same order
sum(names(bVals) != targets$BasenameOriginal)
## [1] 0
cbind(names(bVals), targets$BasenameOriginal)
## [,1]
## [1,] "203839170002_R01C01"
## [2,] "203839170002_R02C01"
## [3,] "203839170002_R03C01"
## [4,] "203839170002_R04C01"
## [5,] "203839170002_R05C01"
## [6,] "203839170002_R06C01"
## [7,] "203839170002_R07C01"
## [8,] "203839170002_R08C01"
## [9,] "203836210101_R01C01"
## [10,] "203836210101_R02C01"
## [11,] "203836210101_R03C01"
## [12,] "203836210101_R04C01"
## [13,] "203836210101_R05C01"
## [14,] "203836210101_R06C01"
## [15,] "203836210101_R07C01"
## [16,] "203836210101_R08C01"
## [17,] "203839170001_R01C01"
## [18,] "203839170001_R02C01"
## [19,] "203839170001_R03C01"
## [20,] "203839170001_R04C01"
## [21,] "203839170001_R05C01"
## [22,] "203839170001_R06C01"
## [23,] "203839170001_R07C01"
## [24,] "203839170001_R08C01"
## [25,] "203833040074_R01C01"
## [26,] "203833040074_R02C01"
## [27,] "203833040074_R03C01"
## [28,] "203833040074_R04C01"
## [29,] "203833040074_R05C01"
## [30,] "203833040074_R06C01"
## [31,] "203833040074_R07C01"
## [32,] "203833040074_R08C01"
#create new variables in sample sheet to indicate contrasts
translate_trt <- function(ch){
switch(ch,
"Non-senescent DMSO-treated control cells" = "noSen_ATV",
"Non-senescent mtDNA-containing controls no pyruvate" = "noSen_Mt_noPy",
"Non-senescent mtDNA-containing controls with pyruvate" = "noSen_Mt_Py",
"Non-senescent mtDNA-depleted controls cells with pyruvate" = "noSen_noMt_Py",
"Senescent ATV-treated cells" = "sen_ATV",
"Senescent mtDNA-depleted cells no pyruvate" = "sen_noMt_noPy"
)
}
targets <- targets %>%
mutate(treatment = map_chr(Description, translate_trt)) %>%
mutate(treatment = factor(treatment))
design <- model.matrix(~0 + treatment, data = targets)
design
## treatmentnoSen_ATV treatmentnoSen_Mt_Py treatmentnoSen_Mt_noPy treatmentnoSen_noMt_Py treatmentsen_ATV treatmentsen_noMt_noPy
## 1 0 0 1 0 0 0
## 2 0 0 1 0 0 0
## 3 0 0 1 0 0 0
## 4 0 0 1 0 0 0
## 5 0 0 1 0 0 0
## 6 0 0 1 0 0 0
## 7 0 1 0 0 0 0
## 8 0 1 0 0 0 0
## 9 0 1 0 0 0 0
## 10 0 1 0 0 0 0
## 11 0 1 0 0 0 0
## 12 0 1 0 0 0 0
## 13 0 0 0 0 0 1
## 14 0 0 0 0 0 1
## 15 0 0 0 0 0 1
## 16 0 0 0 0 0 1
## 17 0 0 0 0 0 1
## 18 0 0 0 0 0 1
## 19 0 0 0 1 0 0
## 20 0 0 0 1 0 0
## 21 0 0 0 1 0 0
## 22 0 0 0 1 0 0
## 23 0 0 0 1 0 0
## 24 0 0 0 1 0 0
## 25 1 0 0 0 0 0
## 26 1 0 0 0 0 0
## 27 1 0 0 0 0 0
## 28 1 0 0 0 0 0
## 29 0 0 0 0 1 0
## 30 0 0 0 0 1 0
## 31 0 0 0 0 1 0
## 32 0 0 0 0 1 0
## attr(,"assign")
## [1] 1 1 1 1 1 1
## attr(,"contrasts")
## attr(,"contrasts")$treatment
## [1] "contr.treatment"
colSums(design)
## treatmentnoSen_ATV treatmentnoSen_Mt_Py treatmentnoSen_Mt_noPy treatmentnoSen_noMt_Py treatmentsen_ATV treatmentsen_noMt_noPy
## 4 6 6 6 4 6
cm <- makeContrasts(ATVvControl = treatmentsen_ATV - treatmentnoSen_ATV,
SenMidasvNoSenMtnoPy = treatmentsen_noMt_noPy - treatmentnoSen_Mt_noPy,
SenMidasvNoSenNoMtPy = treatmentsen_noMt_noPy - treatmentnoSen_noMt_Py,
levels = design
)
fit <- lmFit(bVals, design)
fit2 <- contrasts.fit(fit, contrasts = cm)
fit2 <- eBayes(fit2)
results <- decideTests(fit2)
summary(results)
## ATVvControl SenMidasvNoSenMtnoPy SenMidasvNoSenNoMtPy
## Down 9 0 22857
## NotSig 865848 865859 838077
## Up 2 0 4925
vennDiagram(results)
volcanoplot(fit2, coef = "ATVvControl" , main = "ATV vs control")
volcanoplot(fit2, coef = "SenMidasvNoSenMtnoPy", main = "MiDAS vs control with MtDNA")
volcanoplot(fit2, coef = "SenMidasvNoSenNoMtPy", main = "MiDAS vs control without mtDNA")
dat <- topTable(fit2, coef = "ATVvControl", n = Inf)
dat <- dat %>%
mutate(probeID = row.names(dat))
topTable(fit2, coef = "ATVvControl", confint = TRUE) %>%
kable()
| logFC | CI.L | CI.R | AveExpr | t | P.Value | adj.P.Val | B | |
|---|---|---|---|---|---|---|---|---|
| cg00645473 | 0.0535270 | 0.0405843 | 0.0664696 | 0.8966459 | 8.481569 | 0e+00 | 0.0034060 | 10.763888 |
| cg17652435 | -0.0460689 | -0.0577523 | -0.0343855 | 0.0896966 | -8.086617 | 0e+00 | 0.0043736 | 9.812713 |
| cg05454501 | -0.0229190 | -0.0293537 | -0.0164842 | 0.0354585 | -7.304490 | 1e-07 | 0.0200366 | 7.870240 |
| cg01465169 | -0.0378045 | -0.0488819 | -0.0267272 | 0.0560658 | -6.999000 | 2e-07 | 0.0275752 | 7.091534 |
| cg01971137 | -0.0290129 | -0.0375412 | -0.0204846 | 0.0341193 | -6.976785 | 2e-07 | 0.0275752 | 7.034500 |
| cg26953727 | -0.0429698 | -0.0557349 | -0.0302048 | 0.0761805 | -6.903504 | 2e-07 | 0.0277149 | 6.845977 |
| cg09857158 | -0.0354711 | -0.0461575 | -0.0247847 | 0.0499359 | -6.807232 | 2e-07 | 0.0304142 | 6.597437 |
| cg22621867 | -0.0302225 | -0.0394808 | -0.0209642 | 0.0765190 | -6.694638 | 3e-07 | 0.0317472 | 6.305544 |
| cg18820209 | -0.0315769 | -0.0412587 | -0.0218951 | 0.0684725 | -6.688671 | 3e-07 | 0.0317472 | 6.290039 |
| cg05238741 | -0.0769146 | -0.1006260 | -0.0532031 | 0.0498812 | -6.652392 | 4e-07 | 0.0317472 | 6.195697 |
topTable(fit2, coef = "SenMidasvNoSenMtnoPy", confint = TRUE) %>%
kable()
| logFC | CI.L | CI.R | AveExpr | t | P.Value | adj.P.Val | B | |
|---|---|---|---|---|---|---|---|---|
| cg21720999 | 0.0361671 | 0.0246498 | 0.0476845 | 0.8903936 | 6.440044 | 6.0e-07 | 0.2082651 | 5.468084 |
| cg04247553 | 0.0554357 | 0.0377487 | 0.0731227 | 0.7007547 | 6.427798 | 7.0e-07 | 0.2082651 | 5.435857 |
| cg22844780 | 0.1084488 | 0.0732490 | 0.1436485 | 0.2997607 | 6.318473 | 9.0e-07 | 0.2082651 | 5.147576 |
| cg12520327 | -0.0556408 | -0.0738038 | -0.0374778 | 0.1186453 | -6.282491 | 1.0e-06 | 0.2082651 | 5.052478 |
| cg04408387 | 0.4124997 | 0.2741891 | 0.5508104 | 0.5476185 | 6.116385 | 1.5e-06 | 0.2491065 | 4.612160 |
| cg12770471 | 0.0499735 | 0.0328667 | 0.0670804 | 0.6999067 | 5.990967 | 2.1e-06 | 0.2491065 | 4.278385 |
| cg01014399 | -0.1691188 | -0.2270361 | -0.1112015 | 0.3953825 | -5.988395 | 2.1e-06 | 0.2491065 | 4.271530 |
| cg20416110 | -0.0157394 | -0.0211912 | -0.0102877 | 0.0417993 | -5.920801 | 2.5e-06 | 0.2491065 | 4.091209 |
| cg00221525 | -0.0686183 | -0.0924379 | -0.0447988 | 0.1503046 | -5.907908 | 2.6e-06 | 0.2491065 | 4.056781 |
| cg08080985 | -0.0402609 | -0.0546679 | -0.0258539 | 0.0853104 | -5.731093 | 4.1e-06 | 0.3420635 | 3.583737 |
topTable(fit2, coef = "SenMidasvNoSenNoMtPy", confint = TRUE) %>%
kable()
| logFC | CI.L | CI.R | AveExpr | t | P.Value | adj.P.Val | B | |
|---|---|---|---|---|---|---|---|---|
| cg26591221 | -0.0256581 | -0.0319110 | -0.0194051 | 0.9650291 | -8.415214 | 0 | 0.0021684 | 10.453092 |
| cg22513166 | -0.0969465 | -0.1212837 | -0.0726094 | 0.2972196 | -8.169375 | 0 | 0.0021684 | 9.858456 |
| cg09107055 | -0.1496877 | -0.1876376 | -0.1117379 | 0.3679817 | -8.089158 | 0 | 0.0021684 | 9.662731 |
| cg00675940 | -0.1656357 | -0.2076895 | -0.1235818 | 0.6466739 | -8.077467 | 0 | 0.0021684 | 9.634140 |
| cg21189727 | -0.0378747 | -0.0476890 | -0.0280604 | 0.0760784 | -7.914374 | 0 | 0.0021684 | 9.233425 |
| cg21069829 | -0.1070371 | -0.1347872 | -0.0792870 | 0.6530591 | -7.910367 | 0 | 0.0021684 | 9.223537 |
| cg00399115 | -0.3699645 | -0.4664988 | -0.2734303 | 0.1286240 | -7.859691 | 0 | 0.0021684 | 9.098313 |
| cg10138791 | -0.2355833 | -0.2984960 | -0.1726706 | 0.5639089 | -7.679504 | 0 | 0.0028219 | 8.650461 |
| cg15313859 | -0.1539862 | -0.1956021 | -0.1123703 | 0.3733631 | -7.588378 | 0 | 0.0028219 | 8.422442 |
| cg05025392 | -0.1497952 | -0.1904469 | -0.1091435 | 0.5391017 | -7.556947 | 0 | 0.0028219 | 8.343556 |
#Compare old and new ATV results
datOld <- read_csv(paste0(dir_out, "probewise_ATV.csv"))
names(datOld)[2:4] <- paste("v1", names(datOld)[2:4], sep = "_")
datOld %>%
filter(v1_P_BH <= 0.05) %>%
left_join(dat, by = "probeID") %>%
dplyr::select(probeID, v1_T_STAT, v1_P, v1_P_BH, t, P.Value, adj.P.Val, everything()) %>%
arrange(v1_P) %>%
slice_head(n = 10)
## # A tibble: 10 x 10
## probeID v1_T_STAT v1_P v1_P_BH t P.Value adj.P.Val logFC AveExpr B
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 cg16256574 37.7 0.0000000235 0.0166 0.132 0.896 1.00 0.0186 0.332 -7.33
## 2 cg13529355 36.8 0.0000000474 0.0166 -0.0752 0.941 1.00 -0.00881 0.423 -7.34
## 3 cg02586306 35.1 0.0000000575 0.0166 0.165 0.870 1.00 0.00437 0.151 -7.33
## 4 cg16029913 28.7 0.000000138 0.0226 -0.0151 0.988 1.00 -0.00161 0.517 -7.34
## 5 cg13979492 36.9 0.000000150 0.0226 -0.00925 0.993 1.00 -0.000942 0.689 -7.34
## 6 cg16794634 26.5 0.000000196 0.0226 -0.204 0.840 1.00 -0.00725 0.824 -7.32
## 7 cg23068081 26.8 0.000000206 0.0226 0.0366 0.971 1.00 0.00117 0.692 -7.34
## 8 cg16697896 30.8 0.000000212 0.0226 0.122 0.904 1.00 0.0107 0.357 -7.34
## 9 cg17834252 36.7 0.000000241 0.0226 -0.0808 0.936 1.00 -0.0115 0.302 -7.34
## 10 cg14690837 33.4 0.000000262 0.0226 -0.206 0.838 1.00 -0.00806 0.0997 -7.32
datMerge <- inner_join(datOld, dat, by = "probeID")
cor1 <- cor(datMerge$v1_T_STAT, datMerge$t, use = "complete.obs")
if(cor1 < 0.1) cor1 <- 0.1
datOld %>%
inner_join(dat, by = "probeID") %>%
filter(!is.na(v1_T_STAT)) %>%
filter(!is.na(t)) %>%
ggplot(aes(x = v1_T_STAT, y = t)) +
geom_hex(binwidth = c(cor1, cor1)) +
labs(x = "old results (minfi noob t-test)",
y = "new results (minfi noob, beta values, limma)",
caption = "Old results are from Nate minfi noob data with Dan t-test. New results are from minfi noob generating beta values that are analyzed with limma"
)
Minfi noob normalized data with unpaired t-test replicates the old results. However, adding limma made the results totally different. What do we trust? Limma is more trustworthy, especially for small sample sizes. The test-statistics for the vanilla t-tests were way too high (39!) to be believed, which is most likely due to unstable variance estimates. Limma takes care of this, so we should move forward with limma.
Limma analysis of beta values still retained significant findings for ATV contrast. Now, what about using M-values with basic minfi noob and limma?
So, this will be the official analysis using minfi processing with additional QC, using Mvalues and limma for association testing.
Create an RGChannelSet object.
Detection P-values. Set to missing. Remove bad samples and probes.
Small detection p-values = good signals.
Detection P-value > 0.05 = bad signal. Set to missing.
baseDir <- "../data/raw/idatFiles/"
targets <- read.csv(paste0(baseDir, "SampleSheet.csv"))
targets$BasenameOriginal <- as.character(targets$Basename)
targets$Slide <- substr(targets$Basename, 1, 12)
targets$Basename <- paste0(baseDir, targets$Slide, "/", targets$BasenameOriginal)
rgSet <- read.metharray.exp(base = NULL, targets = targets, recursive = TRUE, force = TRUE)
#Add nicer sample IDs
sum(colnames(rgSet) != targets$BasenameOriginal)
## [1] 0
cbind(colnames(rgSet), targets$BasenameOriginal)
## [,1] [,2]
## [1,] "203839170002_R01C01" "203839170002_R01C01"
## [2,] "203839170002_R02C01" "203839170002_R02C01"
## [3,] "203839170002_R03C01" "203839170002_R03C01"
## [4,] "203839170002_R04C01" "203839170002_R04C01"
## [5,] "203839170002_R05C01" "203839170002_R05C01"
## [6,] "203839170002_R06C01" "203839170002_R06C01"
## [7,] "203839170002_R07C01" "203839170002_R07C01"
## [8,] "203839170002_R08C01" "203839170002_R08C01"
## [9,] "203836210101_R01C01" "203836210101_R01C01"
## [10,] "203836210101_R02C01" "203836210101_R02C01"
## [11,] "203836210101_R03C01" "203836210101_R03C01"
## [12,] "203836210101_R04C01" "203836210101_R04C01"
## [13,] "203836210101_R05C01" "203836210101_R05C01"
## [14,] "203836210101_R06C01" "203836210101_R06C01"
## [15,] "203836210101_R07C01" "203836210101_R07C01"
## [16,] "203836210101_R08C01" "203836210101_R08C01"
## [17,] "203839170001_R01C01" "203839170001_R01C01"
## [18,] "203839170001_R02C01" "203839170001_R02C01"
## [19,] "203839170001_R03C01" "203839170001_R03C01"
## [20,] "203839170001_R04C01" "203839170001_R04C01"
## [21,] "203839170001_R05C01" "203839170001_R05C01"
## [22,] "203839170001_R06C01" "203839170001_R06C01"
## [23,] "203839170001_R07C01" "203839170001_R07C01"
## [24,] "203839170001_R08C01" "203839170001_R08C01"
## [25,] "203833040074_R01C01" "203833040074_R01C01"
## [26,] "203833040074_R02C01" "203833040074_R02C01"
## [27,] "203833040074_R03C01" "203833040074_R03C01"
## [28,] "203833040074_R04C01" "203833040074_R04C01"
## [29,] "203833040074_R05C01" "203833040074_R05C01"
## [30,] "203833040074_R06C01" "203833040074_R06C01"
## [31,] "203833040074_R07C01" "203833040074_R07C01"
## [32,] "203833040074_R08C01" "203833040074_R08C01"
cbind(targets$ExternalSampleID, targets$Description)
## [,1] [,2]
## [1,] "C -P 1" "Non-senescent mtDNA-containing controls no pyruvate"
## [2,] "C +P 1" "Non-senescent mtDNA-containing controls no pyruvate"
## [3,] "mtD -P 1" "Non-senescent mtDNA-containing controls no pyruvate"
## [4,] "mtD +P 1" "Non-senescent mtDNA-containing controls no pyruvate"
## [5,] "C -P 2" "Non-senescent mtDNA-containing controls no pyruvate"
## [6,] "C +P 2" "Non-senescent mtDNA-containing controls no pyruvate"
## [7,] "mtD -P 2" "Non-senescent mtDNA-containing controls with pyruvate"
## [8,] "mtD +P 2" "Non-senescent mtDNA-containing controls with pyruvate"
## [9,] "C -P 3" "Non-senescent mtDNA-containing controls with pyruvate"
## [10,] "C +P 3" "Non-senescent mtDNA-containing controls with pyruvate"
## [11,] "mtD -P 3" "Non-senescent mtDNA-containing controls with pyruvate"
## [12,] "mtD +P 3" "Non-senescent mtDNA-containing controls with pyruvate"
## [13,] "C -P 4" "Senescent mtDNA-depleted cells no pyruvate"
## [14,] "C +P 4" "Senescent mtDNA-depleted cells no pyruvate"
## [15,] "mtD -P 4" "Senescent mtDNA-depleted cells no pyruvate"
## [16,] "mtD +P 4" "Senescent mtDNA-depleted cells no pyruvate"
## [17,] "C -P 5" "Senescent mtDNA-depleted cells no pyruvate"
## [18,] "C +P 5" "Senescent mtDNA-depleted cells no pyruvate"
## [19,] "mtD -P 5" "Non-senescent mtDNA-depleted controls cells with pyruvate"
## [20,] "mtD +P 5" "Non-senescent mtDNA-depleted controls cells with pyruvate"
## [21,] "C -P 6" "Non-senescent mtDNA-depleted controls cells with pyruvate"
## [22,] "C +P 6" "Non-senescent mtDNA-depleted controls cells with pyruvate"
## [23,] "mtD -P 6" "Non-senescent mtDNA-depleted controls cells with pyruvate"
## [24,] "mtD +P 6" "Non-senescent mtDNA-depleted controls cells with pyruvate"
## [25,] " - ATV 1" "Non-senescent DMSO-treated control cells"
## [26,] " + ATV 1" "Non-senescent DMSO-treated control cells"
## [27,] " - ATV 2" "Non-senescent DMSO-treated control cells"
## [28,] " + ATV 2" "Non-senescent DMSO-treated control cells"
## [29,] " - ATV 3" "Senescent ATV-treated cells"
## [30,] " + ATV 3" "Senescent ATV-treated cells"
## [31,] " - ATV 4" "Senescent ATV-treated cells"
## [32,] " + ATV 4" "Senescent ATV-treated cells"
targets$sampleID <- ""
targets$sampleID[1:24] <- paste0("midas", 1:24)
targets$sampleID[25:32] <- paste0("ATV", 1:8)
translate_trt <- function(ch){
switch(ch,
"Non-senescent DMSO-treated control cells" = "noSen_ATV",
"Non-senescent mtDNA-containing controls no pyruvate" = "noSen_Mt_noPy",
"Non-senescent mtDNA-containing controls with pyruvate" = "noSen_Mt_Py",
"Non-senescent mtDNA-depleted controls cells with pyruvate" = "noSen_noMt_Py",
"Senescent ATV-treated cells" = "sen_ATV",
"Senescent mtDNA-depleted cells no pyruvate" = "sen_noMt_noPy"
)
}
targets <- targets %>%
mutate(treatment = map_chr(Description, translate_trt)) %>%
mutate(treatment = factor(treatment))
cbind(targets$sampleID, targets$ExternalSampleID, targets$Description, as.character(targets$treatment))
## [,1] [,2] [,3] [,4]
## [1,] "midas1" "C -P 1" "Non-senescent mtDNA-containing controls no pyruvate" "noSen_Mt_noPy"
## [2,] "midas2" "C +P 1" "Non-senescent mtDNA-containing controls no pyruvate" "noSen_Mt_noPy"
## [3,] "midas3" "mtD -P 1" "Non-senescent mtDNA-containing controls no pyruvate" "noSen_Mt_noPy"
## [4,] "midas4" "mtD +P 1" "Non-senescent mtDNA-containing controls no pyruvate" "noSen_Mt_noPy"
## [5,] "midas5" "C -P 2" "Non-senescent mtDNA-containing controls no pyruvate" "noSen_Mt_noPy"
## [6,] "midas6" "C +P 2" "Non-senescent mtDNA-containing controls no pyruvate" "noSen_Mt_noPy"
## [7,] "midas7" "mtD -P 2" "Non-senescent mtDNA-containing controls with pyruvate" "noSen_Mt_Py"
## [8,] "midas8" "mtD +P 2" "Non-senescent mtDNA-containing controls with pyruvate" "noSen_Mt_Py"
## [9,] "midas9" "C -P 3" "Non-senescent mtDNA-containing controls with pyruvate" "noSen_Mt_Py"
## [10,] "midas10" "C +P 3" "Non-senescent mtDNA-containing controls with pyruvate" "noSen_Mt_Py"
## [11,] "midas11" "mtD -P 3" "Non-senescent mtDNA-containing controls with pyruvate" "noSen_Mt_Py"
## [12,] "midas12" "mtD +P 3" "Non-senescent mtDNA-containing controls with pyruvate" "noSen_Mt_Py"
## [13,] "midas13" "C -P 4" "Senescent mtDNA-depleted cells no pyruvate" "sen_noMt_noPy"
## [14,] "midas14" "C +P 4" "Senescent mtDNA-depleted cells no pyruvate" "sen_noMt_noPy"
## [15,] "midas15" "mtD -P 4" "Senescent mtDNA-depleted cells no pyruvate" "sen_noMt_noPy"
## [16,] "midas16" "mtD +P 4" "Senescent mtDNA-depleted cells no pyruvate" "sen_noMt_noPy"
## [17,] "midas17" "C -P 5" "Senescent mtDNA-depleted cells no pyruvate" "sen_noMt_noPy"
## [18,] "midas18" "C +P 5" "Senescent mtDNA-depleted cells no pyruvate" "sen_noMt_noPy"
## [19,] "midas19" "mtD -P 5" "Non-senescent mtDNA-depleted controls cells with pyruvate" "noSen_noMt_Py"
## [20,] "midas20" "mtD +P 5" "Non-senescent mtDNA-depleted controls cells with pyruvate" "noSen_noMt_Py"
## [21,] "midas21" "C -P 6" "Non-senescent mtDNA-depleted controls cells with pyruvate" "noSen_noMt_Py"
## [22,] "midas22" "C +P 6" "Non-senescent mtDNA-depleted controls cells with pyruvate" "noSen_noMt_Py"
## [23,] "midas23" "mtD -P 6" "Non-senescent mtDNA-depleted controls cells with pyruvate" "noSen_noMt_Py"
## [24,] "midas24" "mtD +P 6" "Non-senescent mtDNA-depleted controls cells with pyruvate" "noSen_noMt_Py"
## [25,] "ATV1" " - ATV 1" "Non-senescent DMSO-treated control cells" "noSen_ATV"
## [26,] "ATV2" " + ATV 1" "Non-senescent DMSO-treated control cells" "noSen_ATV"
## [27,] "ATV3" " - ATV 2" "Non-senescent DMSO-treated control cells" "noSen_ATV"
## [28,] "ATV4" " + ATV 2" "Non-senescent DMSO-treated control cells" "noSen_ATV"
## [29,] "ATV5" " - ATV 3" "Senescent ATV-treated cells" "sen_ATV"
## [30,] "ATV6" " + ATV 3" "Senescent ATV-treated cells" "sen_ATV"
## [31,] "ATV7" " - ATV 4" "Senescent ATV-treated cells" "sen_ATV"
## [32,] "ATV8" " + ATV 4" "Senescent ATV-treated cells" "sen_ATV"
cbind(targets$sampleID, targets$ExternalSampleID, as.character(targets$treatment))
## [,1] [,2] [,3]
## [1,] "midas1" "C -P 1" "noSen_Mt_noPy"
## [2,] "midas2" "C +P 1" "noSen_Mt_noPy"
## [3,] "midas3" "mtD -P 1" "noSen_Mt_noPy"
## [4,] "midas4" "mtD +P 1" "noSen_Mt_noPy"
## [5,] "midas5" "C -P 2" "noSen_Mt_noPy"
## [6,] "midas6" "C +P 2" "noSen_Mt_noPy"
## [7,] "midas7" "mtD -P 2" "noSen_Mt_Py"
## [8,] "midas8" "mtD +P 2" "noSen_Mt_Py"
## [9,] "midas9" "C -P 3" "noSen_Mt_Py"
## [10,] "midas10" "C +P 3" "noSen_Mt_Py"
## [11,] "midas11" "mtD -P 3" "noSen_Mt_Py"
## [12,] "midas12" "mtD +P 3" "noSen_Mt_Py"
## [13,] "midas13" "C -P 4" "sen_noMt_noPy"
## [14,] "midas14" "C +P 4" "sen_noMt_noPy"
## [15,] "midas15" "mtD -P 4" "sen_noMt_noPy"
## [16,] "midas16" "mtD +P 4" "sen_noMt_noPy"
## [17,] "midas17" "C -P 5" "sen_noMt_noPy"
## [18,] "midas18" "C +P 5" "sen_noMt_noPy"
## [19,] "midas19" "mtD -P 5" "noSen_noMt_Py"
## [20,] "midas20" "mtD +P 5" "noSen_noMt_Py"
## [21,] "midas21" "C -P 6" "noSen_noMt_Py"
## [22,] "midas22" "C +P 6" "noSen_noMt_Py"
## [23,] "midas23" "mtD -P 6" "noSen_noMt_Py"
## [24,] "midas24" "mtD +P 6" "noSen_noMt_Py"
## [25,] "ATV1" " - ATV 1" "noSen_ATV"
## [26,] "ATV2" " + ATV 1" "noSen_ATV"
## [27,] "ATV3" " - ATV 2" "noSen_ATV"
## [28,] "ATV4" " + ATV 2" "noSen_ATV"
## [29,] "ATV5" " - ATV 3" "sen_ATV"
## [30,] "ATV6" " + ATV 3" "sen_ATV"
## [31,] "ATV7" " - ATV 4" "sen_ATV"
## [32,] "ATV8" " + ATV 4" "sen_ATV"
sampleNames(rgSet) <- targets$sampleID
Set probes to missing if detP > 0.05.
Remove probes that are bad quality (SNP overlap, cross-reactive, design problems).
#Probe and sample filtering
detP <- detectionP(rgSet)
barplot(colMeans(detP), las = 2, cex.names = 0.8, ylab = "Mean detection p-values")
abline(h = 0.05, col = "red")
qcReport(rgSet, sampNames = targets$sampleID, sampGroups = as.character(targets$treatment),
pdf = "minfiQCreport_v2.pdf")
## png
## 2
#remove bad samples from rgSet
keep <- colMeans(detP) < 0.05
keep
## midas1 midas2 midas3 midas4 midas5 midas6 midas7 midas8 midas9 midas10 midas11 midas12 midas13 midas14 midas15 midas16 midas17 midas18 midas19 midas20 midas21 midas22 midas23 midas24 ATV1 ATV2 ATV3
## TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## ATV4 ATV5 ATV6 ATV7 ATV8
## TRUE TRUE TRUE TRUE TRUE
rgSet <- rgSet[,keep]
rgSet
## class: RGChannelSet
## dim: 1051815 32
## metadata(0):
## assays(2): Green Red
## rownames(1051815): 1600101 1600111 ... 99810990 99810992
## rowData names(0):
## colnames(32): midas1 midas2 ... ATV7 ATV8
## colData names(22): OriginalOrder ExternalSampleID ... Slide filenames
## Annotation
## array: IlluminaHumanMethylationEPIC
## annotation: ilm10b4.hg19
#remove bad samples from targets
targets <- targets[keep,]
#remove bad samples from detP matrix
detP <- detP[, keep]
#Normalization
#must load manifest to perform preprocessNoob
mSet <- preprocessNoob(rgSet)
gset <- mapToGenome(mSet)
grSet <- ratioConvert(gset, what = "both")
#After normalization, can exclude probes
# Probe exclusion based on design problems
xreact_450 <- read_xlsx("~/bigdata/EWAStools/arrayAnnotation/48639-non-specific-probes-Illumina450k.xlsx", sheet = "nonspecific cg probes")
annot_EPIC <- read_tsv("~/bigdata/EWAStools/arrayAnnotation/EPIC.hg19.manifest.tsv")
##
## -- Column specification ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
## cols(
## .default = col_double(),
## CpG_chrm = col_character(),
## probe_strand = col_character(),
## probeID = col_character(),
## channel = col_character(),
## designType = col_character(),
## nextBase = col_character(),
## nextBaseRef = col_character(),
## probeType = col_character(),
## orientation = col_character(),
## ProbeSeq_A = col_character(),
## ProbeSeq_B = col_character(),
## gene = col_character(),
## gene_HGNC = col_character(),
## chrm_A = col_character(),
## cigar_A = col_character(),
## chrm_B = col_character(),
## cigar_B = col_character(),
## wDecoy_chrm_A = col_character(),
## wDecoy_cigar_A = col_character(),
## wDecoy_chrm_B = col_character()
## # ... with 13 more columns
## )
## i Use `spec()` for the full column specifications.
table(annot_EPIC$MASK_general[annot_EPIC$probeID %in% xreact_450$TargetID])
##
## FALSE TRUE
## 1956 25268
annot_EPIC$MASK_general[annot_EPIC$probeID %in% xreact_450$TargetID] <- TRUE
arrayProbes <- featureNames(grSet)
annot_EPIC <- annot_EPIC[match(arrayProbes, annot_EPIC$probeID), ]
sum(arrayProbes != annot_EPIC$probeID)
## [1] 0
grSet <- grSet[!annot_EPIC$MASK_general,]
dim(grSet)
## [1] 764543 32
#Probe exclusion based on detP
arrayProbes <- featureNames(grSet)
detP <- detP[match(arrayProbes, rownames(detP)), ]
# remove probes that failed in one or more samples
keep <- rowSums(detP < 0.05) == ncol(grSet)
table(keep)
## keep
## FALSE TRUE
## 1932 762611
grSet <- grSet[keep,]
# Data exploration with PCA and boxplots
Mvals <- getM(grSet)
boxplot(Mvals, las = 2, ylab = "M-values")
group <- targets$treatment
col.group <- group
levels(col.group) <- brewer.pal(nlevels(col.group), "Dark2")
col.group.ch <- as.character(col.group)
plotMDS(Mvals, gene.selection = "common", col = col.group.ch, pch = 16)
legend("topleft", fill = levels(col.group), legend = levels(group))
cbind(levels(group), levels(col.group))
## [,1] [,2]
## [1,] "noSen_ATV" "#1B9E77"
## [2,] "noSen_Mt_Py" "#D95F02"
## [3,] "noSen_Mt_noPy" "#7570B3"
## [4,] "noSen_noMt_Py" "#E7298A"
## [5,] "sen_ATV" "#66A61E"
## [6,] "sen_noMt_noPy" "#E6AB02"
cbind(colnames(Mvals), as.character(group), col.group.ch)
## col.group.ch
## [1,] "midas1" "noSen_Mt_noPy" "#7570B3"
## [2,] "midas2" "noSen_Mt_noPy" "#7570B3"
## [3,] "midas3" "noSen_Mt_noPy" "#7570B3"
## [4,] "midas4" "noSen_Mt_noPy" "#7570B3"
## [5,] "midas5" "noSen_Mt_noPy" "#7570B3"
## [6,] "midas6" "noSen_Mt_noPy" "#7570B3"
## [7,] "midas7" "noSen_Mt_Py" "#D95F02"
## [8,] "midas8" "noSen_Mt_Py" "#D95F02"
## [9,] "midas9" "noSen_Mt_Py" "#D95F02"
## [10,] "midas10" "noSen_Mt_Py" "#D95F02"
## [11,] "midas11" "noSen_Mt_Py" "#D95F02"
## [12,] "midas12" "noSen_Mt_Py" "#D95F02"
## [13,] "midas13" "sen_noMt_noPy" "#E6AB02"
## [14,] "midas14" "sen_noMt_noPy" "#E6AB02"
## [15,] "midas15" "sen_noMt_noPy" "#E6AB02"
## [16,] "midas16" "sen_noMt_noPy" "#E6AB02"
## [17,] "midas17" "sen_noMt_noPy" "#E6AB02"
## [18,] "midas18" "sen_noMt_noPy" "#E6AB02"
## [19,] "midas19" "noSen_noMt_Py" "#E7298A"
## [20,] "midas20" "noSen_noMt_Py" "#E7298A"
## [21,] "midas21" "noSen_noMt_Py" "#E7298A"
## [22,] "midas22" "noSen_noMt_Py" "#E7298A"
## [23,] "midas23" "noSen_noMt_Py" "#E7298A"
## [24,] "midas24" "noSen_noMt_Py" "#E7298A"
## [25,] "ATV1" "noSen_ATV" "#1B9E77"
## [26,] "ATV2" "noSen_ATV" "#1B9E77"
## [27,] "ATV3" "noSen_ATV" "#1B9E77"
## [28,] "ATV4" "noSen_ATV" "#1B9E77"
## [29,] "ATV5" "sen_ATV" "#66A61E"
## [30,] "ATV6" "sen_ATV" "#66A61E"
## [31,] "ATV7" "sen_ATV" "#66A61E"
## [32,] "ATV8" "sen_ATV" "#66A61E"
## Identify PCA outliers
myMDS <- plotMDS(Mvals, gene.selection = "common", plot = FALSE)
cluster_top <- myMDS$y[myMDS$y > 0.2 ]
cluster_bottom <- myMDS$x[myMDS$x < (-1) ]
cluster_ATV <- myMDS$x[myMDS$x > 2 ]
#cluster top
targets %>%
filter(sampleID %in% names(cluster_top)) %>%
dplyr::select(sampleID, SampleLabel, treatment, Description) %>%
kable
| sampleID | SampleLabel | treatment | Description |
|---|---|---|---|
| midas1 | C -P 1 | noSen_Mt_noPy | Non-senescent mtDNA-containing controls no pyruvate |
| midas2 | C -P 2 | noSen_Mt_noPy | Non-senescent mtDNA-containing controls no pyruvate |
| midas5 | C -P 5 | noSen_Mt_noPy | Non-senescent mtDNA-containing controls no pyruvate |
| midas6 | C -P 6 | noSen_Mt_noPy | Non-senescent mtDNA-containing controls no pyruvate |
| midas9 | C +P 3 | noSen_Mt_Py | Non-senescent mtDNA-containing controls with pyruvate |
| midas10 | C +P 4 | noSen_Mt_Py | Non-senescent mtDNA-containing controls with pyruvate |
| midas13 | mtD -P 1 | sen_noMt_noPy | Senescent mtDNA-depleted cells no pyruvate |
| midas14 | mtD -P 2 | sen_noMt_noPy | Senescent mtDNA-depleted cells no pyruvate |
| midas17 | mtD -P 5 | sen_noMt_noPy | Senescent mtDNA-depleted cells no pyruvate |
| midas18 | mtD -P 6 | sen_noMt_noPy | Senescent mtDNA-depleted cells no pyruvate |
| midas21 | mtD +P 3 | noSen_noMt_Py | Non-senescent mtDNA-depleted controls cells with pyruvate |
| midas22 | mtD +P 4 | noSen_noMt_Py | Non-senescent mtDNA-depleted controls cells with pyruvate |
#cluster bottom
targets %>%
filter(sampleID %in% names(cluster_bottom)) %>%
dplyr::select(sampleID, SampleLabel, treatment, Description) %>%
kable
| sampleID | SampleLabel | treatment | Description |
|---|---|---|---|
| midas3 | C -P 3 | noSen_Mt_noPy | Non-senescent mtDNA-containing controls no pyruvate |
| midas4 | C -P 4 | noSen_Mt_noPy | Non-senescent mtDNA-containing controls no pyruvate |
| midas7 | C +P 1 | noSen_Mt_Py | Non-senescent mtDNA-containing controls with pyruvate |
| midas8 | C +P 2 | noSen_Mt_Py | Non-senescent mtDNA-containing controls with pyruvate |
| midas11 | C +P 5 | noSen_Mt_Py | Non-senescent mtDNA-containing controls with pyruvate |
| midas12 | C +P 6 | noSen_Mt_Py | Non-senescent mtDNA-containing controls with pyruvate |
| midas15 | mtD -P 3 | sen_noMt_noPy | Senescent mtDNA-depleted cells no pyruvate |
| midas16 | mtD -P 4 | sen_noMt_noPy | Senescent mtDNA-depleted cells no pyruvate |
| midas19 | mtD +P 1 | noSen_noMt_Py | Non-senescent mtDNA-depleted controls cells with pyruvate |
| midas20 | mtD +P 2 | noSen_noMt_Py | Non-senescent mtDNA-depleted controls cells with pyruvate |
| midas23 | mtD +P 5 | noSen_noMt_Py | Non-senescent mtDNA-depleted controls cells with pyruvate |
| midas24 | mtD +P 6 | noSen_noMt_Py | Non-senescent mtDNA-depleted controls cells with pyruvate |
#cluster ATV
targets %>%
filter(sampleID %in% names(cluster_ATV)) %>%
dplyr::select(sampleID, SampleLabel, treatment, Description) %>%
kable
| sampleID | SampleLabel | treatment | Description |
|---|---|---|---|
| ATV1 | - ATV | noSen_ATV | Non-senescent DMSO-treated control cells |
| ATV2 | - ATV | noSen_ATV | Non-senescent DMSO-treated control cells |
| ATV3 | - ATV | noSen_ATV | Non-senescent DMSO-treated control cells |
| ATV4 | - ATV | noSen_ATV | Non-senescent DMSO-treated control cells |
| ATV5 | + ATV | sen_ATV | Senescent ATV-treated cells |
| ATV6 | + ATV | sen_ATV | Senescent ATV-treated cells |
| ATV7 | + ATV | sen_ATV | Senescent ATV-treated cells |
| ATV8 | + ATV | sen_ATV | Senescent ATV-treated cells |
Using limma with Mvals
#Targets and sample IDs in same order
sum(colnames(Mvals) != targets$sampleID)
## [1] 0
cbind(colnames(Mvals), targets$sampleID)
## [,1] [,2]
## [1,] "midas1" "midas1"
## [2,] "midas2" "midas2"
## [3,] "midas3" "midas3"
## [4,] "midas4" "midas4"
## [5,] "midas5" "midas5"
## [6,] "midas6" "midas6"
## [7,] "midas7" "midas7"
## [8,] "midas8" "midas8"
## [9,] "midas9" "midas9"
## [10,] "midas10" "midas10"
## [11,] "midas11" "midas11"
## [12,] "midas12" "midas12"
## [13,] "midas13" "midas13"
## [14,] "midas14" "midas14"
## [15,] "midas15" "midas15"
## [16,] "midas16" "midas16"
## [17,] "midas17" "midas17"
## [18,] "midas18" "midas18"
## [19,] "midas19" "midas19"
## [20,] "midas20" "midas20"
## [21,] "midas21" "midas21"
## [22,] "midas22" "midas22"
## [23,] "midas23" "midas23"
## [24,] "midas24" "midas24"
## [25,] "ATV1" "ATV1"
## [26,] "ATV2" "ATV2"
## [27,] "ATV3" "ATV3"
## [28,] "ATV4" "ATV4"
## [29,] "ATV5" "ATV5"
## [30,] "ATV6" "ATV6"
## [31,] "ATV7" "ATV7"
## [32,] "ATV8" "ATV8"
design <- model.matrix(~0 + treatment, data = targets)
design
## treatmentnoSen_ATV treatmentnoSen_Mt_Py treatmentnoSen_Mt_noPy treatmentnoSen_noMt_Py treatmentsen_ATV treatmentsen_noMt_noPy
## 1 0 0 1 0 0 0
## 2 0 0 1 0 0 0
## 3 0 0 1 0 0 0
## 4 0 0 1 0 0 0
## 5 0 0 1 0 0 0
## 6 0 0 1 0 0 0
## 7 0 1 0 0 0 0
## 8 0 1 0 0 0 0
## 9 0 1 0 0 0 0
## 10 0 1 0 0 0 0
## 11 0 1 0 0 0 0
## 12 0 1 0 0 0 0
## 13 0 0 0 0 0 1
## 14 0 0 0 0 0 1
## 15 0 0 0 0 0 1
## 16 0 0 0 0 0 1
## 17 0 0 0 0 0 1
## 18 0 0 0 0 0 1
## 19 0 0 0 1 0 0
## 20 0 0 0 1 0 0
## 21 0 0 0 1 0 0
## 22 0 0 0 1 0 0
## 23 0 0 0 1 0 0
## 24 0 0 0 1 0 0
## 25 1 0 0 0 0 0
## 26 1 0 0 0 0 0
## 27 1 0 0 0 0 0
## 28 1 0 0 0 0 0
## 29 0 0 0 0 1 0
## 30 0 0 0 0 1 0
## 31 0 0 0 0 1 0
## 32 0 0 0 0 1 0
## attr(,"assign")
## [1] 1 1 1 1 1 1
## attr(,"contrasts")
## attr(,"contrasts")$treatment
## [1] "contr.treatment"
colSums(design)
## treatmentnoSen_ATV treatmentnoSen_Mt_Py treatmentnoSen_Mt_noPy treatmentnoSen_noMt_Py treatmentsen_ATV treatmentsen_noMt_noPy
## 4 6 6 6 4 6
cm <- makeContrasts(ATVvControl = treatmentsen_ATV - treatmentnoSen_ATV,
SenMidasvNoSenMtnoPy = treatmentsen_noMt_noPy - treatmentnoSen_Mt_noPy,
SenMidasvNoSenNoMtPy = treatmentsen_noMt_noPy - treatmentnoSen_noMt_Py,
levels = design
)
fit <- lmFit(Mvals, design)
fit2 <- contrasts.fit(fit, contrasts = cm)
fit2 <- eBayes(fit2)
results <- decideTests(fit2)
summary(results)
## ATVvControl SenMidasvNoSenMtnoPy SenMidasvNoSenNoMtPy
## Down 0 0 46144
## NotSig 762611 762611 704604
## Up 0 0 11863
vennDiagram(results)
volcanoplot(fit2, coef = "ATVvControl" , main = "ATV vs control")
volcanoplot(fit2, coef = "SenMidasvNoSenMtnoPy", main = "MiDAS vs control with MtDNA")
volcanoplot(fit2, coef = "SenMidasvNoSenNoMtPy", main = "MiDAS vs control without mtDNA")
topTable(fit2, coef = "ATVvControl", confint = TRUE) %>%
kable()
| logFC | CI.L | CI.R | AveExpr | t | P.Value | adj.P.Val | B | |
|---|---|---|---|---|---|---|---|---|
| cg16201883 | 0.9526968 | 0.6725338 | 1.2328599 | 3.690292 | 6.942084 | 1.0e-07 | 0.0752688 | 1.9772962 |
| cg05454501 | -0.7314665 | -0.9648308 | -0.4981023 | -4.810318 | -6.398913 | 4.0e-07 | 0.1619798 | 1.4212744 |
| cg05238741 | -1.4559297 | -1.9302228 | -0.9816366 | -4.455730 | -6.266710 | 6.0e-07 | 0.1619798 | 1.2795059 |
| cg17367243 | -0.6020866 | -0.8134781 | -0.3906951 | 1.031197 | -5.814567 | 2.3e-06 | 0.4320972 | 0.7760306 |
| cg07002602 | 0.6153742 | 0.3949186 | 0.8358297 | 5.764520 | 5.698549 | 3.1e-06 | 0.4796175 | 0.6423252 |
| cg05045343 | -0.6911336 | -0.9490007 | -0.4332664 | -4.942574 | -5.471571 | 6.0e-06 | 0.6965711 | 0.3756587 |
| cg17890984 | -0.4563705 | -0.6282654 | -0.2844757 | -6.148649 | -5.420017 | 6.9e-06 | 0.6965711 | 0.3141833 |
| cg23569120 | 0.9853088 | 0.6124049 | 1.3582128 | 5.765368 | 5.394130 | 7.4e-06 | 0.6965711 | 0.2831897 |
| cg08749686 | 0.5386579 | 0.3326707 | 0.7446452 | 3.591951 | 5.338493 | 8.7e-06 | 0.6965711 | 0.2163056 |
| cg23304785 | -0.6944412 | -0.9609108 | -0.4279717 | -4.133183 | -5.320273 | 9.2e-06 | 0.6965711 | 0.1943221 |
topTable(fit2, coef = "SenMidasvNoSenMtnoPy", confint = TRUE) %>%
kable()
| logFC | CI.L | CI.R | AveExpr | t | P.Value | adj.P.Val | B | |
|---|---|---|---|---|---|---|---|---|
| cg12520327 | -0.7460243 | -0.9934968 | -0.4985519 | -2.9236813 | -6.154207 | 9.0e-07 | 0.4846866 | -0.5323738 |
| cg20416110 | -0.5665818 | -0.7618443 | -0.3713193 | -4.5358308 | -5.923655 | 1.7e-06 | 0.4846866 | -0.7022375 |
| cg00221525 | -0.7687916 | -1.0382295 | -0.4993537 | -2.5574900 | -5.825000 | 2.2e-06 | 0.4846866 | -0.7767012 |
| cg04408387 | 3.5065388 | 2.2424641 | 4.7706135 | 0.0947063 | 5.663070 | 3.5e-06 | 0.4846866 | -0.9012067 |
| cg08080985 | -0.7171454 | -0.9823526 | -0.4519382 | -3.4521468 | -5.520367 | 5.2e-06 | 0.4846866 | -1.0132445 |
| cg08626888 | 0.5848446 | 0.3677380 | 0.8019511 | -3.9550422 | 5.499377 | 5.5e-06 | 0.4846866 | -1.0299045 |
| cg13178372 | -0.6007792 | -0.8261185 | -0.3754398 | 4.0665387 | -5.442818 | 6.5e-06 | 0.4846866 | -1.0750233 |
| cg04455450 | -0.5703644 | -0.7843846 | -0.3563441 | -3.3200356 | -5.440559 | 6.5e-06 | 0.4846866 | -1.0768323 |
| cg16979575 | -0.6244025 | -0.8600630 | -0.3887420 | -4.0162692 | -5.409085 | 7.1e-06 | 0.4846866 | -1.1020904 |
| cg01613696 | -0.5698075 | -0.7849632 | -0.3546517 | -3.6111283 | -5.406562 | 7.2e-06 | 0.4846866 | -1.1041193 |
topTable(fit2, coef = "SenMidasvNoSenNoMtPy", confint = TRUE) %>%
kable()
| logFC | CI.L | CI.R | AveExpr | t | P.Value | adj.P.Val | B | |
|---|---|---|---|---|---|---|---|---|
| cg26591221 | -1.1088454 | -1.3706771 | -0.8470136 | 4.8466724 | -8.645590 | 0e+00 | 0.0007757 | 11.552813 |
| cg00675940 | -1.1067892 | -1.3757625 | -0.8378158 | 0.9044313 | -8.400432 | 0e+00 | 0.0007757 | 11.013136 |
| cg00399115 | -4.0520858 | -5.0640349 | -3.0401367 | -3.9722647 | -8.174590 | 0e+00 | 0.0007757 | 10.507458 |
| cg02810043 | 4.6869597 | 3.5115466 | 5.8623728 | -2.9951128 | 8.140418 | 0e+00 | 0.0007757 | 10.430239 |
| cg21189727 | -0.7224616 | -0.9174104 | -0.5275129 | -3.6299923 | -7.565547 | 0e+00 | 0.0027958 | 9.103977 |
| cg00329052 | -1.2690241 | -1.6145267 | -0.9235215 | 1.0432169 | -7.498330 | 0e+00 | 0.0027958 | 8.945626 |
| cg09107055 | -0.9070941 | -1.1580408 | -0.6561475 | -0.8023139 | -7.379330 | 0e+00 | 0.0029760 | 8.663656 |
| cg19345940 | -0.8044017 | -1.0272980 | -0.5815055 | 1.8514201 | -7.367433 | 0e+00 | 0.0029760 | 8.635353 |
| cg00864346 | -0.7356627 | -0.9425844 | -0.5287411 | 2.5877843 | -7.258032 | 0e+00 | 0.0035499 | 8.374136 |
| cg26815147 | 1.0939721 | 0.7815802 | 1.4063641 | -2.1266227 | 7.149118 | 1e-07 | 0.0039605 | 8.112397 |
resultAll <- topTable(fit2, coef = "ATVvControl", confint = TRUE, sort.by = "none", number = Inf)
write_csv(resultAll, path = "../results/limma_ATV.csv")
sessionInfo()
## R version 4.0.1 (2020-06-06)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: CentOS Linux 7 (Core)
##
## Matrix products: default
## BLAS/LAPACK: /usr/lib64/libopenblas-r0.3.3.so
##
## locale:
## [1] C
##
## attached base packages:
## [1] stats4 parallel stats graphics utils datasets grDevices methods base
##
## other attached packages:
## [1] IlluminaHumanMethylationEPICanno.ilm10b4.hg19_0.6.0 IlluminaHumanMethylationEPICmanifest_0.3.0 minfi_1.34.0 bumphunter_1.30.0
## [5] locfit_1.5-9.4 iterators_1.0.13 foreach_1.5.1 Biostrings_2.56.0
## [9] XVector_0.28.0 SummarizedExperiment_1.18.2 DelayedArray_0.14.1 matrixStats_0.57.0
## [13] GenomicRanges_1.40.0 GenomeInfoDb_1.24.2 IRanges_2.22.2 S4Vectors_0.26.1
## [17] RColorBrewer_1.1-2 limma_3.44.3 multtest_2.44.0 Biobase_2.48.0
## [21] wheatmap_0.1.0 sesame_1.6.0 sesameData_1.6.0 ExperimentHub_1.14.2
## [25] AnnotationHub_2.20.2 BiocFileCache_1.12.1 dbplyr_1.4.4 BiocGenerics_0.34.0
## [29] knitr_1.30 readxl_1.3.1 forcats_0.5.0 stringr_1.4.0
## [33] dplyr_1.0.2 purrr_0.3.4 readr_1.4.0 tidyr_1.1.2
## [37] tibble_3.0.4 ggplot2_3.3.2 tidyverse_1.3.0 BiocStyle_2.16.1
##
## loaded via a namespace (and not attached):
## [1] backports_1.1.10 plyr_1.8.6 splines_4.0.1 BiocParallel_1.22.0 digest_0.6.26 htmltools_0.5.0 magick_2.5.0
## [8] fansi_0.4.1 magrittr_1.5 memoise_1.1.0 annotate_1.66.0 modelr_0.1.8 siggenes_1.62.0 askpass_1.1
## [15] prettyunits_1.1.1 colorspace_1.4-1 blob_1.2.1 rvest_0.3.6 rappdirs_0.3.1 haven_2.3.1 xfun_0.18
## [22] hexbin_1.28.1 crayon_1.3.4 RCurl_1.98-1.2 jsonlite_1.7.1 genefilter_1.70.0 GEOquery_2.56.0 survival_3.2-7
## [29] glue_1.4.2 gtable_0.3.0 zlibbioc_1.34.0 Rhdf5lib_1.10.1 HDF5Array_1.16.1 scales_1.1.1 DBI_1.1.0
## [36] rngtools_1.5 Rcpp_1.0.5 xtable_1.8-4 progress_1.2.2 mclust_5.4.6 bit_4.0.4 preprocessCore_1.50.0
## [43] httr_1.4.2 ellipsis_0.3.1 farver_2.0.3 reshape_0.8.8 pkgconfig_2.0.3 XML_3.99-0.5 utf8_1.1.4
## [50] DNAcopy_1.62.0 labeling_0.4.2 tidyselect_1.1.0 rlang_0.4.8 later_1.1.0.1 AnnotationDbi_1.50.3 munsell_0.5.0
## [57] BiocVersion_3.11.1 cellranger_1.1.0 tools_4.0.1 cli_2.1.0 generics_0.0.2 RSQLite_2.2.1 broom_0.7.2
## [64] evaluate_0.14 fastmap_1.0.1 yaml_2.2.1 bit64_4.0.5 fs_1.5.0 beanplot_1.2 scrime_1.3.5
## [71] randomForest_4.6-14 nlme_3.1-149 doRNG_1.8.2 mime_0.9 nor1mix_1.3-0 xml2_1.3.2 biomaRt_2.44.4
## [78] compiler_4.0.1 rstudioapi_0.11 curl_4.3 interactiveDisplayBase_1.26.3 reprex_0.3.0 stringi_1.5.3 highr_0.8
## [85] ps_1.4.0 GenomicFeatures_1.40.1 lattice_0.20-41 Matrix_1.2-18 vctrs_0.3.4 pillar_1.4.6 lifecycle_0.2.0
## [92] BiocManager_1.30.10 data.table_1.13.2 bitops_1.0-6 httpuv_1.5.4 rtracklayer_1.48.0 R6_2.4.1 bookdown_0.21
## [99] promises_1.1.1 codetools_0.2-16 MASS_7.3-53 assertthat_0.2.1 rhdf5_2.32.4 openssl_1.4.3 withr_2.3.0
## [106] GenomicAlignments_1.24.0 Rsamtools_2.4.0 GenomeInfoDbData_1.2.3 hms_0.5.3 quadprog_1.5-8 grid_4.0.1 base64_2.0
## [113] rmarkdown_2.5 DelayedMatrixStats_1.10.1 illuminaio_0.30.0 shiny_1.5.0 lubridate_1.7.9